To move data between cloud environments using NiFi site-to-site communication, you
require a dataflow in your CDP Private Cloud Base cluster that can
send and receive data from the CDP Public Cloud cluster. To create this dataflow, connect a
processor to a Remote Process Group configured with HTTP and enable
transmission.
You have defined your CDP Public Cloud dataflow and configured Ranger
policies for site-to site communication.
You have the public FQDNs for your CDP Public Cloud cluster nodes.
In your CDP Private Cloud Base cluster, launch the NiFi
UI and drag a GenerateFlowFile processor onto the canvas.
For this use case, GenerateFlowFile creates 1MB files every
10 seconds.
Drag a Remote Process Group onto the NiFi canvas, configure HTTP protocol, and
specify one or more of the NiFi nodes running on your CDP Public Cloud cluster.
After
the site-to-site connection is initiated, the source NiFi cluster is aware of
the topology of the remote NiFi cluster and of any increase or decrease of the
size of
the remote cluster. However, it is recommended that you
specify at least 2 nodes to ensure higher availability when the site-to-site
connection is
initiated.
Right-click the Remote Process group and select Enable
transmission.
Connect the GenerateFlowFile processor to the Remote Process
Group and select the Input Port that you created and
started on the remote cluster in CDP Public Cloud:
You can also define a connection from the Remote Process Group to another
component to download data made available by the remote cluster running in the
CDP Public Cloud environment. In the this
example,
the Remote Process Group is connected to a funnel.
After
you have defined the dataflow for your CDP Private Cloud Base
cluster, start the CDP Private Cloud Base dataflow and confirm
that the data is moving back and forth between the environments:
In the CDP Private Cloud Base environment, your
dataflow
looks
similar to the
following:
In the CDP Public Cloud environment, your dataflow will look similar
to the
following: