To move data between cloud environments using NiFi site-to-site communication, you
require a data flow in your CDP Private Cloud Base cluster that can
send and receive data from the CDP Public Cloud cluster. To create this data flow, connect a
processor to a Remote Process Group configured with HTTP and enable
transmission.
You have defined your CDP Public Cloud data flow and configured Ranger
policies for site-to site communication.
You have the public FQDNs for your CDP Public Cloud cluster nodes.
In your CDP Private Cloud Base cluster, launch the NiFi
UI and drag a GenerateFlowFile processor onto the canvas.
For this use case, GenerateFlowFile creates 1MB files every
10 seconds.
Drag a Remote Process Group onto the NiFi canvas, configure HTTP protocol, and
specify one or more of the NiFi nodes running on your CDP Public Cloud cluster.
After the site-to-site connection is initiated, the source NiFi cluster is
aware of the topology of the remote NiFi cluster and of any increase or decrease
of the size of the remote cluster. However, it is recommended that you specify
at least 2 nodes to ensure higher availability when the site-to-site connection
is
initiated.
Right-click the Remote Process group and select Enable
transmission.
Connect the GenerateFlowFile processor to the Remote Process
Group and select the Input Port that you created and
started on the remote cluster in CDP Public Cloud:
You can also define a connection from the Remote Process Group to another
component to download data made available by the remote cluster running in the
CDP Public Cloud environment. In the this example, the Remote Process Group is
connected to a funnel.
After you have defined the data flow for your CDP Private Cloud Base cluster, start the CDP Private Cloud Base data flow and confirm that the data is
moving back and forth between the environments:
In the CDP Private Cloud Base environment, your data
flow looks similar to the following:
In the CDP Public Cloud environment, your data flow will look similar to the
following: