Build the data flow
Learn how you can build a data flow using the object store processors to fetch data from one bucket, and after data transformation, ingest the data into another bucket. This involves opening Apache NiFi in your Flow Management cluster, adding processors and other data flow objects to your canvas, and connecting your data flow elements.
Use the ListCDPObjectStore and FetchCDPObjectStore processors to list and fetch data from your source bucket. Then, after transforming the data, use the PutCDPObjectStore to push the transformed data into the target bucket. Use the DeleteCDPObjectStore to delete your data in the source bucket.
Regardless of the type of flow you are building, the first steps in building your data flow are generally the same. Open NiFi, add your processors to the canvas, and connect the processors to create the flow.
You will be logged into NiFi automatically with your CDP credentials.
- To access the NiFi service in your Flow Management Data Hub cluster, navigate to .
- Click the tile representing the Flow Management Data Hub cluster you want to work with.
Click NiFi in the
Services section of the cluster overview page
to access the NiFi UI.
Create the data flow by adding the ListCDPObjectStore
processors onto the canvas.
Drag and drop the processor icon into the canvas.
In the dialog box you can choose which processor you want to add.
- Select the processor of your choice from the list.
- Click Add or double-click the required processor type to add it to the canvas.
- Drag and drop the processor icon into the canvas.
- Add the FetchCDPObjectStore processor.
- Add any processors you require to transform your data.
- Add the PutCDPObjectStore processor to write data to the target bucket.
- Add the DeleteCDPObjectStore processor to write data to the target bucket.
Connect the processors to create the data flow by clicking the connection icon
in the first processor, and dragging and dropping it on the second
A Create Connection dialog appears with two tabs: Details and Settings. You can configure the connection's name, flowfile expiration time period, thresholds for back pressure, load balance strategy and prioritization.
Configure each object store processor in your data flow.