Create and configure controller services for your data flow

Learn how to create and configure controller services for the CDW Iceberg ingest data flow.

You can add controller services to provide shared services to be used by the processors in your data flow. You will use these controller services later when you configure your processors.

  1. To add a controller service to your flow, right-click on the canvas and select Configure from the pop-up menu.
    This displays the Controller Services Configuration window.
  2. Select the Controller Services tab.
  3. Click the + button to display the Add Controller Service dialog.
  4. Select the required controller service and click Add.
  5. Click the Configure icon in the right-hand column and configure the options that you need.
  6. When you have finished the configuration, click Apply to save the changes.
  7. Enable the controller service by clicking the Enable button (flash) in the far-right column of the Controller Services tab.
The following controller services are used in this CDW Iceberg ingest data flow:
  • Hive Catalog Controller Service
  • Kerberos User Service

Configure the Hive Catalog Controller Service

In the context of CDW, the Hive Metastore URI is not exposed publicly. Instead, you need to provide the core-site.xml and hive-site.xml files of the CDP environment. In Flow Management DataHub clusters, these files are made available automatically on every node but the paths must be retrieved. If you use the Cloudera DataFlow data service and its Flow Designer, you can use the #{CDPEnvironment} parameter.
Table 1. Hive Catalog Controller Service properties
Property Description Example value for ingest data flow

Hive Metastore URI

Provide the URI of the metastore location.

Default Warehouse Location

Provide the default warehouse location in the HDFS file system.

Hadoop Configuration Resources

Add a comma-separated list of Hadoop Configuration files, such as hive-site.xml and core-site.xml for kerberos.

Include full paths to the files so that NiFi can refer to those configuration resources from your specified path.

/etc/hive/conf.cloudera.data_context_connector-975b/hive-site.xml,/etc/hadoop/conf.cloudera.stub_dfs/core-site.xml

Configure the Kerberos User Service

Use the Kerberos Password User Service so that you do not need to distribute a keytab file across the NiFi nodes of the cluster.

It is best practice to have a dedicated Machine User created in the control plane for your specific use case so that you can configure specific policies in Ranger and have better control in case of multi-tenancy with many use cases.

Table 2. Kerberos User Service properties
Property Description Example value for ingest data flow

Kerberos Principal

Specify the user name that should be used for authenticating with Kerberos.

Use your CDP workload username to set this Authentication property.

srv_nifi_to_iceberg

Kerberos Password

Provide the password that should be used for authenticating with Kerberos.

Use your CDP workload password to set this Authentication property.

password (sensitive value)

Configure the processor for your data source.