Replicating HIVE Metadata from On-premise to Amazon S3
You must create a new data replication policy to replicate Hive metadata from On-premise to Amazon S3.
Before you create a new replication policy, you must register Amazon S3 cloud account with the Replication Manager service. Before you commence Hive replication, make sure to go through Requirements while using CDH on-premise clusters.
- Management Console > Replication Manager > Policies and click Add Policy.
- Select HIVE as the service in the Create Replication Policy page.
- Enter the Hive replication Policy Name and Description. Click Next.
- Select Source Cluster from the drop-down.
Enter the value for Source Databases and
You can click icon to include additional databases and tables.
Enter the value for Source User.
This user will have the necessary permissions to replicate data.
The Destination Data Lake page appears.
Select the Destination Data Lake cluster from the
The Warehouse Path and The Hive External Table Base Directory path are listed. For example: S3://bucket_name/path
Select Cloud Credential from the drop-down.
- Enter the Username.
Click Validate Policy.
The Replication Manager verifies the data with a status Validate Policy Source and Destination information.
Click Next to proceed to Schedule the
The replication policy schedule page provides a couple of options:
- Run Now (Default) - The replication policy is immediately submitted and processed.
- Schedule Run - The replication policy can be scheduled to run at specified time interval.
The Additional Settings page appears. On this page you can enter values for:
- YARN Queue Name
- Maximum Maps Slots
- Maximum Bandwidth
Once the newly created replication policy is successful, view the newly created replication job status from the Policies page. Verify that the job starts and runs as expected.