Create controller services for your data flow

Learn how you can create and configure controller services for an S3 ingest data flow in CDP Public Cloud. Controller services provide shared services that can be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.

  1. To add a Controller Service to your flow, right-click on the canvas and select Configure from the pop-up menu.
    This displays the Controller Services Configuration window.
  2. Select the Controller Services tab.
  3. Click the + button to display the Add Controller Service dialog.
  4. Select the required Controller Service and click Add.
  5. Perform any necessary Controller Service configuration tasks by clicking the Configure icon in the right-hand column.
  6. When you have finished configuring the options you need, save the changes by clicking the Apply button.
  7. Enable the Controller Service by clicking the Enable button (flash) in the far-right column of the Controller Services tab.
In this example the following controller services are used:
  • AvroReader Controller Service
  • CSVReader Controller Service
  • CSVRecordSetWriter Controller Service
  • AWSIDBrokerCloudCredentialsProvider Controller Service

You will use the record reader and record writer controller services when configuring the data source and merge record processors in your data flow.

You will use the AWS controller service when configuring the PutS3Object data target processor.

AvroReader Controller Service

This controller service parses Avro data and returns each Avro record as a separate record object.

Table 1. AvroReader Controller Service properties
Property Description Example value for ingest data flow

Schema Access Strategy

Specify how to obtain the schema to be used for interpreting the data.

HWX Content-Encoded Schema Reference

Schema Registry

Specify the Controller Service to use for the Schema Registry.

CDPSchemaRegistry

Schema Name

Specify the name of the schema to look up in the Schema Registry property.

customer

CSVReader Controller Service

This controller service parses your CSV-formatted data, returning each row in the CSV file as a separate record.

Table 2. CSVReader Controller Service Properties
Property Description Example value for ingest data flow

Schema Access Strategy

Specify how to obtain the schema to be used for interpreting the data.

Use String Fields From Header

Treat First Line as Header

Specify whether the first line of CSV should be considered a header or a record.

true

CSVRecordSetWriter Controller Service

This controller service writes the contents of a record set as CSV data.

Table 3. CSVRecordSetWriter Controller Service Properties
Property Description Example value for ingest data flow

Schema Write Strategy

Specify how the schema for a record should be added to the data.

Do Not Write Schema

Schema Access Strategy

Specify how to obtain the schema to be used for interpreting the data.

Use 'Schema Name' Property

Schema Name

Specify the name of the schema to look up in the Schema Registry property.

customer

AWSIDBrokerCloudCredentialsProvider Controller Service

This controller service defines credentials for the PutS3Object processor.

Table 4. CSVRecordSetWriter Controller Service Properties
Property Description Example value for ingest data flow

Configuration Resources

Provide the path to the core site configuration file that contains IDBroker-related configuration to. The path is set as default. The only thing you have to configure is username and password.

/etc/hadoop/conf.cloudera.core_settings/core-site.xml

Username

Specify your username.

csso_jsmith

Password

Specify your password.

If you use your own username, provide the CDP workload password associated with your username.

CDP workload password

Configure the processors in your data flow.