Build the data flow

Learn how you can build a data flow using the object store processors to fetch data from one bucket, and after data transformation, ingest the data into another bucket. This involves opening Apache NiFi in your Flow Management cluster, adding processors and other data flow objects to your canvas, and connecting your data flow elements.

Use the ListCDPObjectStore and FetchCDPObjectStore processors to list and fetch data from your source bucket. Then, after transforming the data, use the PutCDPObjectStore to push the transformed data into the target bucket. Use the DeleteCDPObjectStore to delete your data in the source bucket.

Regardless of the type of flow you are building, the first steps in building your data flow are generally the same. Open NiFi, add your processors to the canvas, and connect the processors to create the flow.

  1. Open NiFi 
in 
Data Hub.
    1. To access the NiFi service in your Flow Management 
Data Hub cluster, navigate to Management Console service > Data Hub Clusters.
    2. Click the tile representing the Flow Management Data Hub cluster you want to work with.
    3. Click NiFi in the Services section of the cluster overview page to access the NiFi UI.
    You will be logged into NiFi automatically with your CDP credentials.
  2. Create the data flow by adding the ListCDPObjectStore processors onto the canvas.
    1. Drag and drop the processor icon into the canvas.
      In the dialog box you can choose which processor you want to add.
    2. Select the processor of your choice from the list.
    3. Click Add or double-click the required processor type to add it to the canvas.
  3. Add the FetchCDPObjectStore processor.
  4. Add any processors you require to transform your data.
  5. Add the PutCDPObjectStore processor to write data to the target bucket.
  6. Add the DeleteCDPObjectStore processor to write data to the target bucket.
  7. Connect the processors to create the data flow by clicking the connection icon in the first processor, and dragging and dropping it on the second processor.

    A Create Connection dialog appears with two tabs: Details and Settings. You can configure the connection's name, flowfile expiration time period, thresholds for back pressure, load balance strategy and prioritization.

This is an example data flow with the ListCDPObjectStore, FetchCDPObjectStore, PutCDPObjectStore, and DeleteCDPObjectStore processors:

Configure each object store processor in your data flow.