Configure the processor for your data target

Learn how you can configure the data target processor for your S3 ingest data flow. This example assumes that you are moving data to AWS S3 and shows you how to configure the corresponding processors.

  1. Launch the Configure Processor window by right clicking the processor you added for writing data to S3 (PutHDFSor PutS3Object) and selecting Configure.
    This gives you a configuration dialog with the following tabs: Settings, Scheduling, Properties, Comments.
  2. Configure the processor according to the behavior you expect in your data flow.

    Make sure that you set all required properties, as you cannot start the processor until all mandatory properties have been configured.

    Examples:

    • You can use the following properties for PutS3Object:

      Table 1. PutS3Object processor properties
      Property Description Example value for ingest data flow

      Object key

      ${filename}

      Bucket

      Provide the name of your target bucket in AWS.

      This is not the bucket related to your data lake(s). You have to create it specifically for the data ingest.

      ingest-idbroker-test

      AWS Credentials Provider Service

      Provide here the controller service that is used to obtain AWS credentials provider.

      AWSIDBrokerCloudCredentialsProviderControllerService

      You can leave all other properties as default configurations.

      For a complete list of PutS3Object properties, see the processor documentation.

    • You can use the following properties for PutHDFS:

      Table 2. PutHDFS processor properties
      Property Description Example value for ingest data flow

      Hadoop Configuration Resources

      Specify the path to the core-site.xml configuration file.

      Make sure that the default file system (fs, default.FS) points to the S3 bucket you are writing to.

      /etc/hadoop/conf.cloudera.core_settings/core-site.xml

      Kerberos Principal

      Specify the Kerberos principal (your username) to authenticate against CDP.

      srv_nifi-s3-ingest

      Kerberos Password

      Provide the password that should be used for authenticating with Kerberos.

      password

      Directory

      Provide the path to your target directory in AWS expressed in an S3A compatible path.

      s3a://your path/customer

      You can leave all other properties as default configurations.

      For a complete list of PutHDFS properties, see the processor documentation.

  3. When you have finished configuring the options you need, save the changes by clicking Apply.
    If you want to move data to a different location, review the other use cases in the Cloudera Data Flow for Data Hub library.
Your data flow is ready to ingest data into AWS S3. Start the flow.