Replicating Hive Metadata from On-premise to Amazon S3
You must create a new data replication policy to replicate Hive metadata from On-premise to S3.
Before you create a new replication policy, you must register the cloud account with the Replication Manager service.
- Management Console > Replication Manager > Policies and click Add Policy.
- Select HIVE as the service in the Create Replication Policy page.
- Enter the Hive replication Policy Name and Description. Click Next.
- Select Source Cluster from the drop-down.
Enter the value for Source Databases and
You can click icon to include additional databases and tables.
Enter the value for Source
This user will have the necessary permissions to replicate data.
The Destination Data Lake page appears.
Select the Destination Data Lake cluster from the
The Warehouse Path and The Hive External Table Base Directory path are listed. For example: S3://bucket_name/path
Select Cloud Credential from the drop-down.
- Enter the Username.
Click Validate Policy.
The Replication Manager verifies the data with a status Validate Policy Source and Destination information.
Click Next to proceed to Schedule
the replication policy.
The replication policy schedule page provides a couple of options:
- Run Now (Default) - The replication policy is immediately submitted and processed.
- Schedule Run - The replication policy can be scheduled to run at specified time interval.
The Additional Settings page appears. On this page you can enter values for:
- YARN Queue Name
- Maximum Maps Slots
- Maximum Bandwidth
Once the newly created replication policy is successful, view the newly created replication job status from the Policies page. Verify that the job starts and runs as expected.