Define your CDP Private Cloud Base dataflow

To move data between cloud environments using NiFi site-to-site communication, you require a dataflow in your CDP Private Cloud Base cluster that can send and receive data from the CDP Public Cloud cluster. To create this dataflow, connect a processor to a Remote Process Group configured with HTTP and enable transmission.

  • You have defined your CDP Public Cloud dataflow and configured Ranger policies for site-to site communication.
  • You have the public FQDNs for your CDP Public Cloud cluster nodes.
  1. In your CDP Private Cloud Base cluster, launch the NiFi UI and drag a GenerateFlowFile processor onto the canvas.
    For this use case, GenerateFlowFile creates 1MB files every 10 seconds.
  2. Drag a Remote Process Group onto the NiFi canvas, configure HTTP protocol, and specify one or more of the NiFi nodes running on your CDP Public Cloud cluster.
    After the site-to-site connection is initiated, the source NiFi cluster is aware of the topology of the remote NiFi cluster and of any increase or decrease of the size of the remote cluster. However, it is recommended that you specify at least 2 nodes to ensure higher availability when the site-to-site connection is initiated.
  3. Right-click the Remote Process group and select Enable transmission.
  4. Connect the GenerateFlowFile processor to the Remote Process Group and select the Input Port that you created and started on the remote cluster in CDP Public Cloud:
  5. You can also define a connection from the Remote Process Group to another component to download data made available by the remote cluster running in the CDP Public Cloud environment. In the this example, the Remote Process Group is connected to a funnel.

After you have defined the dataflow for your CDP Private Cloud Base cluster, start the CDP Private Cloud Base dataflow and confirm that the data is moving back and forth between the environments:

In the CDP Private Cloud Base environment, your dataflow looks similar to the following:

In the CDP Public Cloud environment, your dataflow will look similar to the following: