Replication Manager in CDP Private Cloud Base
Replication Manager is a service in Cloudera Manager. You can create replication policies in this service to replicate data across data centers for various use cases which include disaster recovery scenarios, running hybrid workloads, migrating data to/from cloud, or a generic backup/restore scenario. You can also create HDFS or HBase snapshot policies to take snapshots of HDFS directories and HBase tables respectively.
Cloudera Manager provides the following key functionalities in the Cloudera Manager Admin Console that can be leveraged by Replication Manager:
- Select datasets that are critical for your business operations.
- Monitor and track progress of your snapshots and replication jobs through a central console and easily identify issues or files that failed to be transferred.
- Issue Alert when a snapshot or replication job fails or is aborted so that the problem can be diagnosed quickly.
You can also use Cloudera Manager to schedule, save, and restore snapshots of HDFS directories and HBase tables.
Replication Manager provides the following functionalities that you can use to accomplish your data replication goals:
HDFS replication policies
These policies replicate HDFS data and metadata from CDH (version 5.10 and higher) clusters to CDP Private Cloud Base (version 7.0.3 and higher) clusters.
Some use cases where you can use HDFS replication policies include:
- copying data from legacy on-premises systems to Amazon S3 or Microsoft ADLS Gen2 (ABFS) cloud buckets or from cloud buckets to on-premise systems.
-
replicating required data to another cluster to run load-intensive workflows on it which optimizes the primary cluster performance.
-
deploying a complete backup-restore solution for your enterprise.
Hive external table replication policies
These policies replicate HDFS, Hive external tables (without manual translation of Hive datasets to HDFS datasets, or vice versa), Hive metastore data, Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore, Impala data, and Sentry permissions to Ranger from CDH (version 5.10 and higher) clusters to CDP Private Cloud Base (version 7.0.3 and higher) clusters. In this instance, applications that depend on external table definitions stored in Hive, operate on both replica and source as the table definitions are updated.
Some use cases where you might find these replication policies useful is to:
- backup legacy data for future use or archive cold data
- replicate or move data to cloud clusters to run analytics
- implement a complete backup and disaster recovery solution
Hive ACID table replication policies
These policies replicate HDFS, Hive managed (ACID) data and metadata between CDP Private Cloud Base 7.1.8 and higher clusters using Cloudera Manager version 7.7.1 or higher.
Some use cases where these replication policies can be used by security-conscious organizations such as financial organizations and others is to:
- replicate non-sensitive data to cloud deployments to use as a backup
- migrate data to another cluster to run load-intensive workflows
- use the failover functionality to make the disaster recovery cluster as your primary cluster so that the data ingestion being performed by a replication policy is uninterrupted
Ozone replication policies
You can create Ozone replication policies to replicate data in Ozone buckets between CDP Private Cloud Base 7.1.8 clusters or higher using Cloudera Manager 7.7.1 or higher.
- FSO buckets in source and target clusters using ofs protocol. Supports incremental replication using file checksums.
- legacy buckets in source and target clusters using ofs protocol. Supports incremental replication using file checksums.
- OBS buckets in source and target clusters that support S3A filesystem using the S3A scheme or replication protocol.
You can use these policies to replicate or migrate the required Ozone data to another cluster to run load-intensive workloads, back up data, or for backup-restore use cases.
HDFS and HBase snapshot policies
These policies take regular point-in-time snapshots of HDFS directories and HBase tables respectively.
Snapshots act as a backup, and you can restore an HDFS directory or a HBase table to a previous version or to another location on the same HDFS or HBase service as necessary. Snapshots are also used by replication policies. The first replication policy run replicates all the data and metadata from the chosen directories. The subsequent replication policy runs leverage HDFS snapshot diffs to replicate the changed data.