Understanding the use case

You can use the Apache NiFi site-to-site functionality to move data between a Public Cloud and a CDP Private Cloud Base environment. To do this, set up a cluster in each environment, prepare your network and truststore configurations, and then define your CDP Private Cloud Base and CDP Public Cloud data flows and Apache Ranger configuration for site-to-site functionality.

Moving data between CDP Public Cloud and CDP Private Cloud Base clusters is a common use case when there is a need for a lot of temporary compute resources that can be quickly provisioned in the cloud.

Imagine you have a large dataset on-premises and you wish to perform heavy computations on the dataset. You can use the following workflow to design a data flow that:

  • Moves the dataset from your CDP Private Cloud Base environment to your CDP Public Cloud environment
  • Pushes the data to the appropriate destination
  • Triggers the workload that processes the data while leveraging the auto-scaling capabilities that CDP Public Cloud provides
  • Returns the results in your CDP Private Cloud Base environment

All of this is powered by CDP Private Cloud Base and CDP Public Cloud distributions, while ensuring consistent security policies at a fine-grained level with Apache Ranger, and data management and data lineage with Apache Atlas across the environments.