Hive/Impala replication policy

Hive/Impala replication policies copy (replicate) Hive metastore and data from one cluster to another, and synchronizes the Hive metastore and data set on the destination cluster with the source.

Minimum Required Role: Replication Administrator (also provided by Full Administrator)

The destination cluster must be managed by the Cloudera Manager Server where the replication is being set up, and the source cluster can be managed by that same server or by a peer Cloudera Manager Server.

While replicating from CDH clusters to CDP Private Cloud Base clusters, it is recommended that the HDFS Destination Path is defined in the Hive replication policy. If the HDFS destination path is not defined and the Replicate HDFS File option is set as true, the data is replicated with the original source name. For example, the replicated table data which must reside in the /warehouse/tablespace/external/hive directory is replicated to the /user/hive/warehouse location. Also, not defining HDFS Destination Path before the replication process can result in a large chunk of HDFS space being used for unwanted data movement.

Since Hive3 has a different default table type and warehouse directory structure, the following changes apply while replicating Hive data from CDH5 or CDH6 versions to CDP Private Cloud Base:
  • When you replicate from a CDH cluster to a CDP Private Cloud Base cluster, all tables become external tables during Hive replication. As of this release, Replication Manager does not support Hive2 -> Hive3 replication into ACID tables and all the tables are necessarily replicated as external tables.
  • Replicated tables are created under the external Hive warehouse directory set by hive.metastore.warehouse.external.dir Hive configuration parameter. You must make sure that this has a different value than hive.metastore.warehouse.dir Hive configuration parameter, that is the location of managed tables.
  • If you want to replicate the same database from Hive2 to Hive3 (that will have different paths by design), you must use Force Overwrite option to avoid any mismatch issues.
Configuration notes:
  • If the hadoop.proxyuser.hive.groups configuration has been changed to restrict access to the Hive Metastore Server to certain users or groups, the hdfs group or a group containing the hdfs user must also be included in the list of groups specified for Hive/Impala replication to work. This configuration can be specified either on the Hive service as an override, or in the core-site HDFS configuration. This applies to configuration settings on both the source and destination clusters.
  • If you configured on the target cluster for the directory where HDFS data is copied during Hive/Impala replication, the permissions that were copied during replication, are overwritten by the HDFS ACL synchronization and are not preserved