Introduction to Replication Manager
Replication Manager is a service to copy and migrate data from CDH 5.13+ and above clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base 7.1.4 and above clusters (HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. The supported Public Cloud services include Amazon S3 or Microsoft Azure ADLS Gen2 (ABFS). Replication Manager from HDP clusters to CDP Public Cloud Azure is a beta feature and is not available for general use.
For information about the support matrix, see Support matrix for Replication Manager on CDP Public Cloud.
The Replication Manager supports the following features:
- HDFS replication policies.
- Replicates HDFS data. You must create schedules to replicate data incrementally.
- Hive replication policies:
Replicates data stored in Hive tables, Hive metadata, data in Hive metastore, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore.
Replicates Hive managed table and external table replication. ACID tables and managed tables in Hive are converted to external tables after replication.
Supports table-level replication.
Supports migration of Sentry permissions to Ranger. To perform the Sentry policy replication, you must be running the Sentry service on CDH 5.12 or higher, or any CDH 6.x version.
You must create schedules to replicate data incrementally.
- Apache HBase replication policies.
HBase replication policies supports the following replication scenarios:
- From CDP Private Cloud Base cluster to Data Lake cluster.
- From CDH cluster to Data Lake cluster.
- From CDH cluster to Cloudera Operational Database (COD) cluster.
- From COD cluster to COD cluster.
You can use HBase replication policies to perform an active-active disaster recovery with conflict resolution (enabling other disaster recovery use cases which provides an efficient utilization of resources), or to replicate HBase data in CDP Private Cloud Base clusters, CDH clusters, or COD. You can copy or replicate HBase data between different environments within a Virtual Private Cloud (VPC) using HBase replication policies.
Any data change in the source cluster is pushed to the target cluster automatically without user intervention.