Introduction to Replication Manager
Replication Manager is a service to copy and migrate data from CDH clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base clusters (HBase data) to CDP Public Cloud. This service enables you to replicate data across data centers for disaster recovery scenarios.
You can use the Replication Manager service to replicate data from CDH clusters to CDP Public Cloud clusters that are running on Amazon S3 or Microsoft Azure ADLS Gen2 (ABFS). For information about the support matrix, see Support matrix for Replication Manager on CDP Public Cloud.
The Replication Manager supports the following features:
- HDFS replication policies.
Replicates HDFS data. You must create schedules to replicate data incrementally.
- Hive replication policies:
Replicates data stored in Hive tables, Hive metadata, data in Hive metastore, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore.
Replicates Hive managed table and external table replication. ACID tables and managed tables in Hive are converted to external tables after replication.
Supports table-level replication
Supports migration of Sentry permissions to Ranger. To perform the Sentry policy replication, you must be running the Sentry service on CDH 5.12 or higher, or any CDH 6.x version. The Ranger version running on your cloud cluster must be 3.1.
You must create schedules to replicate data incrementally.
- Apache HBase replication policies. HBase replication policies supports the following replication scenarios:
- From CDP Private Cloud Base cluster to Data Hub cluster.
- From CDH cluster to Data Hub cluster.
- From CDH cluster to COD cluster.
- From COD cluster to COD cluster.
You can use HBase replication policies to perform an active-active disaster recovery with conflict resolution (enabling other disaster recovery use cases which provides an efficient utilization of resources), or to replicate data in Cloudera Operational Database (COD) or HBase. You can copy or replicate HBase data between different environments within a Virtual Private Network (VPC) using HBase replication.
Any data change in the source cluster is pushed to the target cluster automatically without user intervention.