Deploying the Non-CDP S3 to CDP S3 ReadyFlow

Learn how to use the Deployment wizard to deploy the Non-CDP S3 to CDP S3 ReadyFlow using the information you collected using the prerequisites checklist.

The CDF Catalog is where you manage the flow definition lifecycle, from initial import, through versioning, to deploying a flow definition.

  1. In DataFlow, from the left navigation pane, click Catalog.
    Flow definitions available for you to deploy are displayed, one definition per row.
  2. Launch the Deployment wizard.
    1. Click the row to display the flow definition details and versions.
    2. Click a row representing a flow definition version to display flow definition version details and the Deploy button.
    3. Click Deploy to launch the Deployment wizard.
  3. Select the environment to which you want to deploy this version of your flow definition, and click Continue.
  4. In the Overview, give your flow deployment a unique name.

    You can use this name to distinguish between different versions of a flow definition, flow definitions deployed to different environments, and similar.

  5. In NiFi Configuration:
    1. Select a NiFi Runtime Version for your flow deployment. Cloudera recommends that you always use the latest available version, if possible.
    2. Autostart Behavior is on by default, allowing your flow to start automatically after successful deployment. You can clear selection if you do not want the automatic start.
  6. In Parameters, specify parameter values like connection strings, usernames and similar, and upload files like truststores, and similar.
  7. Specify your Sizing & Scaling configurations.
    NiFi node sizing
    You can adjust the size of your cluster from Extra Small to Large
    Number of NiFi nodes
    • You can set whether you want to automatically scale your cluster depending on resource demands. When you enable autoscaling, the minimum NiFi nodes are used for initial size and the workload scales up or down depending on resource demands.
    • You can set the number of nodes from 1 to 32.
  8. In Key Performance Indicators, you can set up your metrics system with specific KPIs to track the performance of a deployed flow. You can also define when and how to receive alerts about the KPI metrics tracking.

    See Working with KPIs for more information about the KPIs available and how you can monitor them.

  9. Review the summary of the information you provided in the Deployment wizard and make any necessary edits by clicking Previous. When you are finished, complete your flow deployment by clicking Deploy.

Once you click Deploy, you are redirected to the Alerts tab in the Flow Deployment Detail view where you can track its progress.

For the Non-CDP S3 to S3 Readyflow, the following parameters are required. Use the information you collected in the Meeting the prerequisites section.

Non-CDP S3 to S3 ReadyFlow configuration parameters
Parameter Name Description Example
CDP Workload User Specify the CDP machine user or workload username that you want to use to authenticate to the object stores. Ensure this user has the appropriate access rights to the object store locations in Ranger or IDBroker.
CDP Workload User Password Specify the password of the CDP machine user or workload user you are using to authenticate against the object stores (via IDBroker).
Destination S3 Bucket Specify the name of the destination (CDP managed) S3 bucket you want to write to.

The full path is constructed from:

s3a://#{Destination S3 Bucket}/#{Destination S3 Path}

Destination S3 Path Specify the path within the destination bucket where you want to write to. Make sure that the path starts with "/".

The full path is constructed from:

s3a://#{Destination S3 Bucket}/#{Destination S3 Path}

Preserve S3 Prefix Specify whether to preserve or ignore the folder hierarchy of the source files. Set to "TRUE" to preserve folder hierarchy in the destination path. Set to "FALSE" to flatten the hierarchy and only keep the file in the destination path.
Source S3 Access Key ID Specify the source (external) S3 access key ID.
Source S3 Bucket Specify the name of the source (external) S3 bucket you want to read from.
Source S3 Path Specify the name of the S3 path you want to read from.
Source S3 Region Specify the AWS region in which your source (external) bucket was created.
Source S3 Secret Access Key Specify the source (external) S3 secret access key.