Hive replication policies

You can use Hive replication policies in CDP Private Cloud Data Services Replication Manager to replicate Hive metastore and Hive external tables between CDP Private Cloud Base 7.1.8 or higher clusters.

Before you replicate using Hive replication policies, consider the following limitations and guidelines:

  • Managed table replication is not supported. Replication Manager translates the managed table from the source cluster to the target cluster as an external table. Replication Manager stores the replicated table as an external table.
  • If the hadoop.proxyuser.hive.groups configuration is changed to restrict access to the Hive Metastore Server for certain users or groups, you must include the hdfs group or a group containing the hdfs user in the list of groups specified for Hive/Impala replication to work. This configuration can be specified either on the Hive service as an override, or in the core-site.xml file in the HDFS configuration on both the source and destination clusters.
  • Before you use the drop table and truncate table DDL commands, consider the following:
    • If you create a Hive replication policy for a Hive table and you drop the table in the source cluster after replication, the table remains on the destination cluster. The table is not dropped when subsequent replications occur.
    • If you drop a table on the destination cluster, and the table is still included in the replication job, the table is re-created on the destination during the replication.
    • If you drop a table partition or index on the source cluster, the replication job also drops it on the destination cluster.
    • If you truncate a table, and the Delete Policy option during replication policy creation is set to Delete to Trash or Delete Permanently, the corresponding data files are deleted on the destination during subsequent replication jobs.