Creating a Hive replication policy
To replicate Hive metadata from on-premises to cloud, you must set the Ranger policy in Ranger and then create the Hive replication policy in Replication Manager.
- Log in to Ranger Admin UI.
- In the Hadoop_SQL section, provide hdfs user permission to "all-database, table, column" in hdfs.
- On the Add Policy. , click
In the Create Replication Policy wizard, select
- Enter the Hive replication Policy Name and Description. Click Next.
- Select Source Cluster from the drop-down.
Enter the value for Source Databases and
You can click icon to include additional databases and tables.
- Enter the value for Source User. Ensure that the user has the necessary permissions to replicate data.
- Click Next.
Select the Destination Data Lake cluster from the
The Warehouse Path and The Hive External Table Base Directory path for the Data Lake appears. For example:
Select Cloud Credential from the drop-down.
- Enter the Username.
Click Validate Policy.
The Replication Manager verifies the data with a status Validate Policy Source and Destination information.
- Click Next to schedule the replication policy.
On the Schedulepage, choose one of the following options:
- Run Now (Default) - The replication policy is immediately submitted and processed.
- Schedule Run - The replication policy can be scheduled to run at specified time interval.
In the Repeat field, you can choose one of the following
- Does Not Repeat
- Custom - In the Custom Recurrence dialog box, choose the time, date, and the frequency to run the policy.
- Click Next.
On the Additional Settings page, enter the values as
- YARN Queue Name - If you are using Capacity Scheduler queues to limit resource consumption, enter the name of the YARN queue for the cluster to which the replication job is submitted. The default value for this field is default.
- Maximum Maps Slots - Use this option to set the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.
- Maximum Bandwidth - You can adjust this setting so that each map task is throttled to consume only the specified bandwidth so that the net bandwidth used tends towards the specified value. The default value for the bandwidth is 100MB per second for each mapper.
Choose one of the following Sentry permissions:
- Include Sentry Permissions with Metadata - Select this option to migrate Sentry permissions during the replication job.
- Exclude Sentry Permissions from Metadata (Default) - Select this option if you do not want to migrate Sentry permissions during the replication job.
- Skip URI Privileges - Select this option if you do not want to include URI privileges when you migrate Sentry permissions. During migration, the URI privileges are translated to point to an equivalent location in S3. If the resources have a different location in Amazon S3, do not migrate the URI privileges because the URI privileges might not be valid.
- Click Create.