Use Replication Manager to migrate to CDP Public Cloud
Replication Manager is a service to copy and migrate data from CDH 5.13+ and above
clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base 7.1.4 and above clusters
(HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. The supported
Public Cloud services include Amazon S3 or Microsoft Azure ADLS Gen2 (ABFS). Replication
Manager from HDP clusters to CDP Public Cloud Azure is a beta feature and is not available
for general use.
About Replication Manager Replication Manager is a service in CDP Public Cloud. You can create replication policies in Replication Manager to copy and migrate data from CDH (version 5.13 and higher) clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base (version 7.1.4 and higher) clusters (HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. You can also replicate HDFS data from cloud storage to classic clusters (CDH or CDP Private Cloud Base clusters), and Hive external tables to Data Hubs. The supported Public Cloud services include Amazon S3 and Microsoft Azure ADLS Gen2 (ABFS). Replicating Hive managed tables using Replication Manager from HDP clusters to CDP Public Cloud is a beta feature and is not available for general use.Access Replication Manager service in CDP Public Cloud You can access the Replication Manager service by logging into Cloudera Data Platform.How replication policies work In CDP Public Cloud Replication Manager, you create replication policies to establish the rules you want applied to your replication jobs. The policy rules you set can include which cluster is the source and which is the destination, what data is replicated, what day and time the replication job occurs, the frequency of job runs, and bandwidth restrictions.Using HDFS replication policies You can use the HDFS replication policies in CDP Public Cloud Replication Manager to replicate HDFS data. The HDFS replication policies can replicate HDFS data and metadata from classic clusters (CDH, CDP Private Cloud Base, and HDP) to CDP Public Cloud storage buckets such as S3 and ABFS, and from cloud storage to classic clusters (CDH or CDP Private Cloud Base clusters). To use an on-premises cluster (CDH or CDP Private Cloud Base cluster) in the replication policy, you must register it as a classic cluster in the Management Console. To use the cloud storage for data replication, you must register the cloud credentials in Replication Manager so that the Replication Manager service can access the cloud storage. You must also verify cluster access and configure minimum ports for replication before you create HDFS replication policies.Using Hive replication policies To create a Hive replication policy in CDP Public Cloud Replication Manager, you must configure the required Ranger policy in Ranger, register the on-premises cluster (CDH or CDP Private Cloud Base) as a classic cluster in Management Console, register cloud account credentials in the Replication Manager service, verify cluster access, and configure minimum ports for replication. The replication load happens on the source on-premises cluster. You can replicate data on-premises to the cloud with a single cluster if the Metastore is running on the cloud.Using HBase replication policies To create an HBase replication policy in CDP Public Cloud Replication Manager, you must register the on-premises cluster (CDH or CDP Private Cloud Base) as a classic cluster in Management Console, register cloud account credentials in the Replication Manager service, verify cluster access, and configure minimum ports for replication.Troubleshooting replication policies in CDP Public Cloud The troubleshooting scenarios in this topic help you to troubleshoot issues in the Replication Manager service in CDP Public Cloud.Appendix Before you create replication policies, you must register the Amazon S3 or Azure cloud credentials to use as cloud storage in CDP Public Cloud Replication Manager, and register the on-premises clusters (CDH or CDP Private Cloud Base) as classic clusters in the Management Console. You can also configure an SSL/TLS certificate exchange between two Cloudera Manager instances that manage source and target cluster respectively to replicate data securely.