Configure the processor for your data target

Learn how to configure a data target processor for the ADLS ingest data flow.

You can set up a data flow to move data to many locations. This example assumes that you are moving data to ADLS and shows you how to configure the corresponding processors. If you want to move data to a different location, review our other use cases in the Cloudera Data Flow for Data Hub library.

  1. Launch the Configure Processor window by right clicking the processor you added for writing data to ADLS (PutHDFS or PutAzureDataLakeStorage) and selecting Configure.

    This gives you a configuration dialog with the following tabs: Settings, Scheduling, Properties, Comments.

  2. Configure the processor according to the behavior you expect in your data flow.

    Make sure that you set all required properties, as you cannot start the processor until all mandatory properties have been configured.

  3. When you have finished configuring the options you need, save the changes by clicking the Apply button.

    In this example, the following properties are used for PutHDFS:

    Table 1. PutHDFS processor properties
    Property Description Example value for ingest data flow

    Hadoop Configuration Resources

    Specify the path to the core-site.xml configuration file.

    Make sure that the default file system (fs, default.FS) points to the ADLS bucket you are writing to.

    /etc/hadoop/conf.cloudera.core_settings/core-site.xml

    Kerberos Principal

    Specify the Kerberos principal (your CDP username) to authenticate against CDP.

    srv_nifi-adls-ingest

    Kerberos Password

    Provide the password that should be used for authenticating with Kerberos.

    password

    Directory

    Provide the path to your target directory in Azure expressed in an abfs compatible path format.

    abfs://<YourFileSystem>@<YourStorageAccount>.dfs.core.windows.net/<TargetPathWithinFileSystem>

    You can leave all other properties as default configurations.

    For a complete list of PutHDFS properties, see the processor documentation.

    If you want to use the PutAzureDataLakeStorage processor to store the data in ADLS, you have to configure your Azure connection in a secure way by providing:

    • Storage Account Name - the name of your Azure storage account that holds the containers where you want to write to)
    • Storage Account Key or your SAS Token - the authentication key that allows you to write data to the Azure storage account
    • Filesystem Name - the name of your Azure storage account file system where you want to write to
    • Directory Name - the name of the folder within your filesystem where you want to write to

    Make sure that you set all required properties, as you cannot start the processor until all mandatory properties have been configured.

Your data flow is ready to ingest data into ADLS. Start the flow.