Creating and configuring controller services

Learn how to set up and configure controller services specifically tailored for your CDW Iceberg ingest data flow.

You can add controller services to provide shared services to be used by the processors in your data flow. You will use these controller services later when you configure your processors.

  1. To add a controller service to your flow, right-click on the NiFi canvas and select Configure from the pop-up menu.
    This displays the Controller Services Configuration window.
  2. Select the Controller Services tab.
  3. Click the + button to display the Add Controller Service dialog.
  4. Select the required controller service and click Add.
  5. Click the Configure icon in the right-hand column and configure the options that you need.
  6. When you have finished the configuration, click Apply to save the changes.
  7. Enable the controller service by clicking the Enable button (flash) in the far-right column of the Controller Services tab.
The following controller services are used in this CDW Iceberg ingest data flow:
  • Hive Catalog Controller Service
  • Kerberos Password User Service
See the tables below for property details.

Configuring the Hive Catalog Controller Service

In the context of CDW, the Hive Metastore URI is the same as the one for the Hive service of the Base cluster. Rather than specifying the thrift URI directly, a more straightforward approach is to provide the necessary configuration files.

To set up the configuration correctly, you need to supply the following files from your CDP Private Cloud Base environment:
  • core-site.xml
  • hdfs-site.xml
  • hive-site.xml

In Flow Management clusters, these files are made available automatically on every node assuming the Hive gateway role has been added to the NiFi nodes in the Base cluster.

Table 1. Hive Catalog Controller Service properties
Property Description Example value for ingest data flow

Hive Metastore URI

Provide the URI of the metastore location.

Default Warehouse Location

Provide the default warehouse location in the HDFS file system.

Hadoop Configuration Resources

Add a comma-separated list of Hadoop Configuration files, such as hive-site.xml and core-site.xml for kerberos.

Include full paths to the files so that NiFi can refer to those configuration resources from your specified path.

/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hive/conf/hdfs-site.xml

Configuring the Kerberos Password User Service

Use the Kerberos Password User Service so that you do not need to distribute a keytab file across the NiFi nodes of the cluster.

It is best practice to have a dedicated service account created for your specific use case so that you can configure specific policies in Ranger and have better control in case of multi-tenancy with many use cases.

Table 2. Kerberos User Service properties
Property Description Example value for ingest data flow

Kerberos Principal

Specify the user name that should be used for authenticating with Kerberos.

Use your CDP workload username to set this Authentication property.

srv_nifi_to_iceberg

Kerberos Password

Provide the password that should be used for authenticating with Kerberos.

Use your CDP workload password to set this Authentication property.

password (sensitive value)

Configure the processor for your data source.