Introduction to Replication Manager

Replication Manager is a service to copy and migrate data from CDH 5.13+ and above clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base 7.1.4 and above clusters (HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. The supported Public Cloud services include Amazon S3 or Microsoft Azure ADLS Gen2 (ABFS). Replication Manager from HDP clusters to CDP Public Cloud Azure is a beta feature and is not available for general use.

For more information, see Support matrix for Replication Manager on CDP Public Cloud.

The Replication Manager supports the following features:

HDFS replication policies
Replicates HDFS data. You must create schedules to replicate data incrementally.
Hive replication policies
  • Replicates data stored in Hive tables, Hive metadata, data in Hive metastore, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore.

  • Replicates Hive managed table and external table replication. ACID tables and managed tables in Hive are converted to external tables after replication.

  • Supports table-level replication.

  • Supports migration of Sentry permissions to Ranger. To perform the Sentry policy replication, you must be running the Sentry service on CDH 5.12 or higher, or any CDH 6.x version.

Apache HBase replication policies
You can replicate HBase data from a source classic cluster (CDH or CDP Private Cloud Base cluster), COD, or Data Hub to a target Data Hub or COD cluster.
You can use HBase replication policies to perform an active-active disaster recovery with conflict resolution (enabling other disaster recovery use cases which provides an efficient utilization of resources), or to replicate HBase data in CDP Private Cloud Base clusters, CDH clusters, or COD. You can copy or replicate HBase data between different environments within a Virtual Private Cloud (VPC) using HBase replication policies.

Any data change in the source cluster is pushed to the target cluster automatically without user intervention.

CDP CLI for HDFS and Hive replication policies
You can also use CDP CLI commands to create HDFS and Hive replication policies. The CDP CLI commands for Replication Manager are under the replicationmanager CDP CLI option. For more information, see CDP CLI for Replication Manager.