Create controller services for your data flow
Learn how you can create and configure controller services for an S3 ingest data flow in CDP Public Cloud. Controller services provide shared services that can be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.
- AvroReader Controller Service
- CSVReader Controller Service
- CSVRecordSetWriter Controller Service
- AWSIDBrokerCloudCredentialsProvider Controller Service
You will use the record reader and record writer controller services when configuring the data source and merge record processors in your data flow.
You will use the AWS controller service when configuring the PutS3Object
data target processor.
AvroReader Controller Service
This controller service parses Avro data and returns each Avro record as a separate record object.
Property | Description | Example value for ingest data flow |
---|---|---|
Schema Access Strategy |
Specify how to obtain the schema to be used for interpreting the data. |
HWX Content-Encoded Schema Reference |
Schema Registry |
Specify the Controller Service to use for the Schema Registry. |
CDPSchemaRegistry |
Schema Name |
Specify the name of the schema to look up in the Schema Registry property. |
customer |
CSVReader Controller Service
This controller service parses your CSV-formatted data, returning each row in the CSV file as a separate record.
Property | Description | Example value for ingest data flow |
---|---|---|
Schema Access Strategy |
Specify how to obtain the schema to be used for interpreting the data. |
Use String Fields From Header |
Treat First Line as Header |
Specify whether the first line of CSV should be considered a header or a record. |
true |
CSVRecordSetWriter Controller Service
This controller service writes the contents of a record set as CSV data.
Property | Description | Example value for ingest data flow |
---|---|---|
Schema Write Strategy |
Specify how the schema for a record should be added to the data. |
Do Not Write Schema |
Schema Access Strategy |
Specify how to obtain the schema to be used for interpreting the data. |
Use 'Schema Name' Property |
Schema Name |
Specify the name of the schema to look up in the Schema Registry property. |
customer |
AWSIDBrokerCloudCredentialsProvider Controller Service
This controller service defines credentials for the PutS3Object
processor.
Property | Description | Example value for ingest data flow |
---|---|---|
Configuration Resources |
Provide the path to the core site configuration file that contains IDBroker-related configuration to. The path is set as default. The only thing you have to configure is username and password. |
/etc/hadoop/conf.cloudera.core_settings/core-site.xml |
Username |
Specify your username. |
csso_jsmith |
Password |
Specify your password. If you use your own username, provide the CDP workload password associated with your username. |
CDP workload password |