Hive external table replication policies enable you to copy (replicate) your Hive
metastore and data from one cluster to another and synchronize the Hive metastore and data set
on the 'destination' cluster with the source, based on a specified replication policy.
Replication Manager requires a valid license. To understand more about Cloudera license
requirements, see Managing Licenses.
Before you create replication policies, ensure that the source cluster and target cluster
are supported by Replication Manager. For information about supported clusters and supported
replication scenarios by Replication Manager, see Support matrix for Replication Manager on CDP Private Cloud Base.
The destination cluster must be managed by the Cloudera Manager Server where the
replication is being set up, and the source cluster can be managed by that same server
or by a peer Cloudera Manager Server.
Configuration notes:
If the hadoop.proxyuser.hive.groups configuration has been changed to
restrict access to the Hive Metastore Server to certain users or groups, the
hdfs group or a group containing the hdfs user must
also be included in the list of groups specified for Hive/Impala replication to work. This
configuration can be specified either on the Hive service as an override, or in the
core-site HDFS configuration. This applies to configuration settings on both the source
and destination clusters.
If you configured on the target cluster for the directory where HDFS data is copied
during Hive/Impala replication, the permissions that were copied during replication, are
overwritten by the HDFS ACL synchronization and are not preserved
To replicate Hive/Impala data to and from S3 or ADLS, you must have the appropriate
credentials to access the S3 or ADLS account. Additionally, you must create buckets in S3 or
data lake store in ADLS. Replication Manager backs up file metadata, including extended
attributes and ACLs when you replicate data to cloud storage. Replication Manager supports the
following replication scenarios:
Replicate to and from Amazon S3 from CDH 5.14+ and Cloudera Manager version
5.13+.
Replication Manager does not support S3 as a source or destination when S3 is
configured to use SSE-KMS.
Replicate to and from Microsoft ADLS Gen1 from CDH 5.13+ and Cloudera Manager
5.15, 5.16, 6.1+.
Replicate to Microsoft ADLS Gen2 (ABFS) from CDH 5.13+ and Cloudera Manager
6.1+.