About Replication Manager

Replication Manager is a service to copy and migrate data from CDH 5.13+ and above clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base 7.1.4 and above clusters (HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. The supported Public Cloud services include Amazon S3 or Microsoft Azure ADLS Gen2 (ABFS). You can also migrate HDFS data from cloud storage to CDH clusters. Replication Manager from HDP clusters to CDP Public Cloud Azure is a beta feature and is not available for general use.

For information about the support matrix, see Support matrix for Replication Manager on CDP Public Cloud.

You can create replication policies to replicate data. Before you create a replication policy, you must prepare the clusters, register the on-premises clusters (CDH or CDP Private Cloud Base) in Management Console, register the cloud storage in Replication Manager, and verify whether the minimum port configuration is configured.

After you create a replication policy, you can perform various actions on the replication policy. You can monitor the replication jobs, view job history, and perform actions on the replication job.

The Replication Manager supports the following features:

HDFS replication policies
Replicates HDFS data. You must create schedules to replicate data incrementally.
Hive replication policies
  • Replicates data stored in Hive tables, Hive metadata, data in Hive metastore, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore.

  • Replicates Hive managed table and external table replication. ACID tables and managed tables in Hive are converted to external tables after replication.

  • Supports table-level replication.

  • Supports migration of Sentry permissions to Ranger. To perform the Sentry policy replication, you must be running the Sentry service on CDH 5.12 or higher, or any CDH 6.x version.

You must create schedules to replicate data incrementally.

Apache HBase replication policies
You can replicate HBase data from a source classic cluster (CDH or CDP Private Cloud Base cluster), COD, or Data Hub to a target Data Hub or COD cluster.
You can use HBase replication policies to perform an active-active disaster recovery with conflict resolution (enabling other disaster recovery use cases which provides an efficient utilization of resources), or to replicate HBase data in CDP Private Cloud Base clusters, CDH clusters, or COD. You can copy or replicate HBase data between different environments within a Virtual Private Cloud (VPC) using HBase replication policies.

Any data change in the source cluster is pushed to the target cluster automatically without user intervention.

CDP CLI commands for HDFS and Hive replication policies
You can also use CDP CLI commands for HDFS and Hive replication policies. The CDP CLI commands for Replication Manager are under the replicationmanager CDP CLI option. For more information, see CDP CLI for Replication Manager.