Build the data flow

From the Apache NiFi canvas, set up the elements of your data flow. This involves opening NiFi in CDP Public Cloud, adding processors to your NiFi canvas, and connecting the processors.

When you are building a data flow to ingest data into Apache Hive following are the preferred Hive processors

  • Select_Hive_QL_3 – To get data out of Hive tables.

  • PutHive3Streaming – To write data to managed and external Hive tables.

  1. Open NiFi in CDP Public Cloud.
    1. To aces the NiFi service in your Flow Management cluster, navigate to Management Console service > Data Hub Clusters.
    2. Click the tile representing the Flow Management cluster with which you want to work.
    3. Click the NiFi icon in the Services section of the Cluster overview page to access the NiFi UI.
  2. Add the NiFi Processors to your canvas.
    1. Select the Processor icon from the Cloudera Flow Management actions pane, and drag a processor to the Canvas.
    2. Use the Add Processor filter box to search for the processor you want to add, and then click Add.
    3. Add each of the processors you want to use for your data flow.
  3. Connect the two processors to create a flow.
    1. Click the connection icon in the first processor, and drag it to the second processor.
    2. A Create Connection dialog displays. It has Details and Settings tabs. You can configure the connection's name, FlowFile expiration time period, thresholds for back pressure, load balance strategy, and prioritization.
    3. Click Add to close the dialog box and add the connection to your flow.
      Optionally, you can add success and failure funnels to your data flow, which help you see where flow files are routed when your data flow is running.

Your data flow may look similar to the following:

Create the Controller service for your data flow. You will need these services later on as you configure your data flow target processor.