Use Replication Manager to migrate to CDP Public Cloud
Replication Manager is a service to copy and migrate data from CDH 5.13+ and above
clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base 7.1.4 and above clusters
(HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. The supported
Public Cloud services include Amazon S3 or Microsoft Azure ADLS Gen2 (ABFS). Replication
Manager from HDP clusters to CDP Public Cloud Azure is a beta feature and is not available
for general use.
About Replication Manager Replication Manager is a service to copy and migrate data from CDH 5.13+ and above clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base 7.1.4 and above clusters (HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. The supported Public Cloud services include Amazon S3 or Microsoft Azure ADLS Gen2 (ABFS). You can also migrate HDFS data from cloud storage to CDH clusters. Replication Manager from HDP clusters to CDP Public Cloud Azure is a beta feature and is not available for general use.Access Replication Manager service You can access the Replication Manager service by logging into Cloudera Data Platform .How replication policies work In Replication Manager, you create replication policies to establish the rules you want applied to your replication jobs. The policy rules you set can include which cluster is the source and which is the destination, what data is replicated, what day and time the replication job occurs, the frequency of job runs, and bandwidth restrictions.Using HDFS replication policies You can use the HDFS replication policies in CDP Public Cloud Replication Manager to replicate HDFS data. You can replicate HDFS data from a CDH cluster or CDP Private Cloud Base cluster to cloud storage, and from cloud storage to CDH cluster. To use the on-premises cluster (CDH or CDP Private Cloud Base cluster) in the replication policy, you must register it as a classic cluster in the Management Console. To use the cloud storage for data replication, you must register the cloud credentials in Replication Manager so that the Replication Manager service can access the cloud storage. Using Hive replication policies To create a Hive replication policy, you must configure the required Ranger policy in Ranger, register the on-premises cluster (CDH or CDP Private Cloud Base) as a classic cluster in Management Console, register cloud account credentials in the Replication Manager service, verify cluster access, and configure minimum ports for replication. The replication load happens on the source on-premises cluster. You can replicate data on-premises to the cloud with a single cluster if the Metastore is running on the cloud.Using HBase replication policies To create a HBase replication policy, you must register the on-premises cluster (CDH or CDP Private Cloud Base) as a classic cluster in Management Console, register cloud account credentials in the Replication Manager service, verify cluster access, and configure minimum ports for replication. When you create the first HBase replication policy to replicate HBase data from a source cluster to a target cluster, the Replication Manager performs the first-time setup configuration steps and then replicates the data. Troubleshooting replication policies in CDP Public Cloud The troubleshooting scenarios in this topic help you to troubleshoot issues in the Replication Manager service in CDP Public Cloud.Appendix Before you create replication policies, you must register the Amazon S3 or Azure cloud credentials to use as cloud storage in CDP Public Cloud Replication Manager, and register the on-premises clusters (CDH or CDP Private Cloud Base) as classic clusters in the Management Console. You can also configure an SSL/TLS certificate exchange between two Cloudera Manager instances that manage source and target cluster respectively to replicate data securely.