Hive external table replication policies
Hive external table replication policies enable you to copy (replicate) your Hive metastore and data from one cluster to another and synchronize the Hive metastore and data set on the 'destination' cluster with the source, based on a specified replication policy. You can also use CDP Private Cloud Base Replication Manager to replicate Hive/Impala data to and from S3 or ADLS, however you cannot replicate data from one S3 or ADLS instance to another using Replication Manager.
The destination cluster must be managed by the Cloudera Manager Server where the replication is being set up, and the source cluster can be managed by that same server or by a peer Cloudera Manager Server.
- Configuration notes
-
- If the
hadoop.proxyuser.hive.groups
configuration has been changed to restrict access to the Hive Metastore Server to certain users or groups, thehdfs
group or a group containing thehdfs
user must also be included in the list of groups specified for Hive/Impala replication to work. This configuration can be specified either on the Hive service as an override, or in the core-site HDFS configuration. This applies to configuration settings on both the source and destination clusters. - If you configured on the target cluster for the directory where HDFS data is copied during Hive/Impala replication, the permissions that were copied during replication, are overwritten by the HDFS ACL synchronization and are not preserved
noteIf your deployment includes tables backed by Kudu, Replication Manager filters out Kudu tables for a Hive external table replication in order to prevent data loss or corruption. - If the
- Create Hive ACID table replication policy for the database to replicate the managed data.
- After the replication completes, create the Hive external table replication policy to replicate the external tables in the database.
- Replicate to and from Amazon S3 from CDH 5.14+ and Cloudera Manager version
5.13+.
Replication Manager does not support S3 as a source or destination when S3 is configured to use SSE-KMS.
- Replicate to and from Microsoft ADLS Gen1 from CDH 5.13+ and Cloudera Manager 5.15, 5.16, 6.1+.
- Replicate to Microsoft ADLS Gen2 (ABFS) from CDH 5.13+ and Cloudera Manager 6.1+.