Replicating HIVE Metadata from On-premise to Amazon S3

You must create a new data replication policy to replicate Hive metadata from On-premise to Amazon S3.

Before you create a new replication policy, you must register Amazon S3 cloud account with the Replication Manager service.

  1. Management Console > Replication Manager > Policies and click Add Policy.
  2. Select HIVE as the service in the Create Replication Policy page.
  3. Enter the Hive replication Policy Name and Description. Click Next.
  4. Select Source Cluster from the drop-down.
  5. Enter the value for Source Databases and Tables.

    You can click icon to include additional databases and tables.

  6. Enter the value for Source User.
    This user will have the necessary permissions to replicate data.
  7. Click Next.
    The Destination Data Lake page appears.
  8. Select the Destination Data Lake cluster from the drop-down.

    The Warehouse Path and The Hive External Table Base Directory path are listed. For example: S3://bucket_name/path

  9. Select Cloud Credential from the drop-down.
  10. Enter the Username.
  11. Click Validate Policy.
    The Replication Manager verifies the data with a status Validate Policy Source and Destination information.
  12. Click Next to proceed to Schedule the replication policy.
    The replication policy schedule page provides a couple of options:
    • Run Now (Default) - The replication policy is immediately submitted and processed.
    • Schedule Run - The replication policy can be scheduled to run at specified time interval.
  13. Click Next.
    The Additional Settings page appears. On this page you can enter values for:
    • YARN Queue Name
    • Maximum Maps Slots
    • Maximum Bandwidth
  14. Click Create.
    Once the newly created replication policy is successful, view the newly created replication job status from the Policies page. Verify that the job starts and runs as expected.