Defining your Cloudera Private Cloud Base data flow

To move data between cloud environments using NiFi site-to-site communication, you require a data flow in your Cloudera Private Cloud Base cluster that can send and receive data from the CDP Public Cloud cluster. To create this data flow, connect a processor to a Remote Process Group configured with HTTP and enable transmission.

  • You have defined your CDP Public Cloud data flow and configured Ranger policies for site-to site communication.
  • You have the public FQDNs for your CDP Public Cloud cluster nodes.
  1. In your Cloudera Private Cloud Base cluster, launch the NiFi UI and drag a GenerateFlowFile processor onto the canvas.
    For this use case, GenerateFlowFile creates 1MB files every 10 seconds.
  2. Drag a Remote Process Group onto the NiFi canvas, configure HTTP protocol, and specify one or more of the NiFi nodes running on your CDP Public Cloud cluster.
    After the site-to-site connection is initiated, the source NiFi cluster is aware of the topology of the remote NiFi cluster and of any increase or decrease of the size of the remote cluster. However, it is recommended that you specify at least 2 nodes to ensure higher availability when the site-to-site connection is initiated.
  3. Right-click the Remote Process group and select Enable transmission.
  4. Connect the GenerateFlowFile processor to the Remote Process Group and select the Input Port that you created and started on the remote cluster in CDP Public Cloud:
  5. You can also define a connection from the Remote Process Group to another component to download data made available by the remote cluster running in the CDP Public Cloud environment. In the this example, the Remote Process Group is connected to a funnel.

After you have defined the data flow for your Cloudera Private Cloud Base cluster, start the Cloudera Private Cloud Base data flow and confirm that the data is moving back and forth between the environments:

In the Cloudera Private Cloud Base environment, your data flow looks similar to the following:

In the CDP Public Cloud environment, your data flow will look similar to the following: