Using Hive replication policies
To create a Hive replication policy in Cloudera Replication Manager, you must configure the required Ranger policy in Ranger, register the on-premises cluster (CDH or Cloudera Private Cloud Base) as a classic cluster in Cloudera Management Console, register cloud account credentials in the Replication Manager service, verify cluster access, and configure minimum ports for replication. The replication load happens on the source on-premises cluster. You can replicate data on-premises to the cloud with a single cluster if the Metastore is running on the cloud.
- replicate data stored in Hive tables, Hive metadata, data in Hive metastore, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore, and
- migrate Sentry permissions to Ranger.
Hive metadata replication involves multiple entities. Replication Manager supports replication of external tables in Hive. Hive supports replication of external tables to the target cluster and it retains all the properties of external tables. The data files permission and ownership are preserved so that the relevant external processes can continue to write in it even after failover.
You can also use Cloudera CLI commands to create Hive replication policies. The Cloudera CLI commands for Replication Manager are under the replicationmanager Cloudera CLI option. For more information, see Cloudera CLI for Replication Manager.
The Apache Ranger access policy model consists of the following components:
- Specification of the resources that you can apply to a replication policy which includes the HDFS files and directories; Hive databases, tables, and columns; and HBase tables, column-families, and columns.
- Specification of access conditions for specific users and groups.
You must set the Ranger policy for the hdfs user on the target cluster to perform all operations on all databases and tables. The same user role is used to import Hive Metastore. The hdfs user should have access to all Hive datasets, including all operations. Otherwise, Hive import fails during the replication process.
On the target cluster, the hive user must have Ranger admin privileges. The same hive user performs the metadata import operation.
For more information about Hive replication policies to replicate data from CDH clusters to Cloudera Public Cloud, see Migrate Hive data from CDH to Cloudera Public Cloud blog.