Create controller services for your data flow

Learn how you can create and configure controller services for an S3 ingest data flow in CDP Public Cloud. Controller services provide shared services that can be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.

To add a Controller Service to your flow, right-click on the canvas and select Configure from the pop-up menu.
This displays the Controller Services Configuration window.
Select the Controller Services tab.
Click the + button to display the Add Controller Service dialog.
Select the required Controller Service and click Add.
Perform any necessary Controller Service configuration tasks by clicking the Configure icon in the right-hand column.
When you have finished configuring the options you need, save the changes by clicking the Apply button.
Enable the Controller Service by clicking the Enable button (flash) in the far-right column of the Controller Services tab.

In this example the following controller services are used:

AvroReader Controller Service
CSVReader Controller Service
CSVRecordSetWriter Controller Service
AWSIDBrokerCloudCredentialsProvider Controller Service

You will use the record reader and record writer controller services when configuring the data source and merge record processors in your data flow.

You will use the AWS controller service when configuring the PutS3Object data target processor.

AvroReader Controller Service

This controller service parses Avro data and returns each Avro record as a separate record object.

Table 1. AvroReader Controller Service properties
Property	Description	Example value for ingest data flow
Schema Access Strategy	Specify how to obtain the schema to be used for interpreting the data.	HWX Content-Encoded Schema Reference
Schema Registry	Specify the Controller Service to use for the Schema Registry.	CDPSchemaRegistry
Schema Name	Specify the name of the schema to look up in the Schema Registry property.	customer

CSVReader Controller Service

This controller service parses your CSV-formatted data, returning each row in the CSV file as a separate record.

Table 2. CSVReader Controller Service Properties
Property	Description	Example value for ingest data flow
Schema Access Strategy	Specify how to obtain the schema to be used for interpreting the data.	Use String Fields From Header
Treat First Line as Header	Specify whether the first line of CSV should be considered a header or a record.	true

CSVRecordSetWriter Controller Service

This controller service writes the contents of a record set as CSV data.

Table 3. CSVRecordSetWriter Controller Service Properties
Property	Description	Example value for ingest data flow
Schema Write Strategy	Specify how the schema for a record should be added to the data.	Do Not Write Schema
Schema Access Strategy	Specify how to obtain the schema to be used for interpreting the data.	Use 'Schema Name' Property
Schema Name	Specify the name of the schema to look up in the Schema Registry property.	customer

AWSIDBrokerCloudCredentialsProvider Controller Service

This controller service defines credentials for the PutS3Object processor.

Table 4. CSVRecordSetWriter Controller Service Properties
Property	Description	Example value for ingest data flow
Configuration Resources	Provide the path to the core site configuration file that contains IDBroker-related configuration to. The path is set as default. The only thing you have to configure is username and password. note The core-site.xml file is present on every Flow Management cluster. Cloudera Manager stores the right core-site.xml file in the same /etc directory for every cluster.	/etc/hadoop/conf.cloudera.core_settings/core-site.xml
Username	Specify your username.	csso_jsmith
Password	Specify your password. If you use your own username, provide the CDP workload password associated with your username.	CDP workload password

Configure the processors in your data flow.