Hive cloud replication
DLM supports replication of the Hive database from a cluster with underlying HDFS to another cluster with cloud storage. It uses push-based replication, with the replication job running on the cluster with HDFS.
Hive stores its metadata in Hive Metastore, but the underlying data is stored in HDFS or cloud
storage. In a Hadoop cluster with Hive service, the Hive warehouse directory can be configured
with either HDFS or cloud storage.
Note | |
---|---|
Hive replication from cloud storage to HDFS is
currently not supported. |
- You can rename the dataset in the policy that is replicated.
- You can create a pull-based policy on the source cluster to move data from the target back into the source cluster Hive database.
- DLM does not manage Ranger policies and any PII/secure data that gets replicated from on-premise to Amazon S3. You must manage these items outside of DLM.
- Hive replication from an HDFS-based cluster to a cloud storage-based cluster requires the
following:
- Source cluster
The cluster with a Hive warehouse directory on local HDFS. This can be an On-premise cluster or an IaaS cluster with data on local HDFS. The required services are HDFS, YARN, Hive, Ranger, Knox and DLM Engine.
- Destination cluster
The cluster with data on cloud storage. The cluster minimally requires Hive Metastore, Ranger, Knox, and DLM Engine.
- Source cluster