Understanding the use case

You can use the Apache NiFi site-to-site functionality to move data between a Cloudera on cloud and a Cloudera Base on premises environment. To do this, set up a cluster in each environment, prepare your network and truststore configurations, and then define your Cloudera Base on premises and Cloudera on cloud data flows and Apache Ranger configuration for site-to-site functionality.

Moving data between Cloudera on cloud and Cloudera Base on premises clusters is a common use case when there is a need for a lot of temporary compute resources that can be quickly provisioned in the cloud.

Imagine you have a large dataset on-premises and you wish to perform heavy computations on the dataset. You can use the following workflow to design a data flow that:

  • Moves the dataset from your Cloudera Base on premises environment to your Cloudera on cloud environment
  • Pushes the data to the appropriate destination
  • Triggers the workload that processes the data while leveraging the auto-scaling capabilities that Cloudera on cloud provides
  • Returns the results in your Cloudera Base on premises environment

All of this is powered by Cloudera Base on premises and Cloudera on cloud distributions, while ensuring consistent security policies at a fine-grained level with Apache Ranger, and data management and data lineage with Apache Atlas across the environments.