Create and configure controller services for your data flow

Learn how to create and configure controller services for the CDW Iceberg ingest data flow.

You can add controller services to provide shared services to be used by the processors in your data flow. You will use these controller services later when you configure your processors.

To add a controller service to your flow, right-click on the canvas and select Configure from the pop-up menu.
This displays the Controller Services Configuration window.
Select the Controller Services tab.
Click the + button to display the Add Controller Service dialog.
Select the required controller service and click Add.
Click the Configure icon in the right-hand column and configure the options that you need.
When you have finished the configuration, click Apply to save the changes.
Enable the controller service by clicking the Enable button (flash) in the far-right column of the Controller Services tab.

The following controller services are used in this CDW Iceberg ingest example:

Hive Catalog Controller Service
Kerberos User Service

See below for property details.

Configure the Hive Catalog Controller Service

In the context of CDW, the Hive Metastore URI is not exposed publicly. Instead, you need to provide the core-site.xml and hive-site.xml files of the CDP environment. In Flow Management DataHub clusters, these files are made available automatically on every node but the paths must be retrieved. If you use the Cloudera DataFlow data service and its Flow Designer, you can just use the #{CDPEnvironment} parameter.

Table 1. Hive Catalog Controller Service properties
Property	Description	Example value for ingest data flow
Hive Metastore URI	Provide the URI of the metastore location.
Default Warehouse Location	Provide the default warehouse location in the HDFS file system.
Hadoop Configuration Resources	Add a comma-separated list of Hadoop Configuration files, such as hive-site.xml and core-site.xml for kerberos. Include full paths to the files so that NiFi can refer to those configuration resources from your specified path.	/etc/hive/conf.cloudera.data_context_connector-975b/hive-site.xml,/etc/hadoop/conf.cloudera.stub_dfs/core-site.xml

Configure the Kerberos User Service

Usethe Kerberos Password User Service so that you do not need to distribute a keytab file across the NiFi nodes of the cluster.

It is best practice to have a dedicated Machine User created in the control plane for your specific use case so that you can configure specific policies in Ranger and have better control in case of multi-tenancy with many use cases.

Table 2. Kerberos User Service properties
Property	Description	Example value for ingest data flow
Kerberos Principal	Specify the user name that should be used for authenticating with Kerberos. Use your CDP workload username to set this Authentication property.	srv_nifi_to_iceberg
Kerberos Password	Provide the password that should be used for authenticating with Kerberos. Use your CDP workload password to set this Authentication property.	password (sensitive value)

Configure the processor for your data source.