Hive replication from on-prem cluster to the cloud storage requires minimal cluster
on the target with metadata services like HMS, Ranger, Atlas, and DLM engine. HMS should be
configured with Hive warehouse directory on cloud storage. Refer to the following
steps:
-
Hive Data Locations - Hive metastore requires these specific configurations to
point Hive data on cloud storage. Note that both hive.metastore.warehouse.dir
and hive.repl.replica.functions.root.dir should be configured in the same
bucket.
hive.metastore.warehouse.dir=<cloud storage>
hive.repl.replica.functions.root.dir=<cloud storage>
hive.warehouse.subdir.inherit.perms=false
-
Cloud access credentials - When Hive metastore is configured with Hive
warehouse directory on cloud storage, Hive will also require the credentials to
access the cloud storage. This can be setup with one of the following
configurations:
- Access key and secret key
- Session token
- For IAAS clusters, setup instance profiles
-
Cloud encryption configurations - If the bucket is encrypted, setup the bucket encryption details
| Note |
---|
Set all these configurations in hive-site.xml.
|