Creating Hive ACID table replication policy

You can create a Hive ACID replication policy after you set up the environment and configure the required parameters.

Verify whether the prerequisites are met. For more information, see Prepare to create Hive ACID table replication policies.

Go to the Replication > Replication Policies page in the Cloudera Manager for the target cluster where the peer is set up.
Click Create Replication Policy.
Select Hive ACID Table Replication Policy.
The following sample image shows the Hive ACID Table Replication Policy option on the Replication Policies page in Cloudera Manager:

The Create Hive ACID Table Replication Policy wizard appears.

In the General tab, configure the following options:


Option	Action to perform
Policy Name	Enter a unique name for the replication policy.
Source	Choose the source cluster.
Destination	Choose the target cluster.
Destination Staging Path	Enter a valid path to the staging location. The path must have the required permissions for the hive user. For example: /user/hive/data. Hive ACID table replication policy uses a common staging location on the source or target cluster. After you enter a valid path to the staging location, the hive.repl.rootdir parameter is configured with this path. The REPL DUMP command dumps data in the staging location in the source cluster. The REPL LOAD command reads the data from the staging location. The REPL DUMP command runs in the source cluster and the REPL LOAD command runs in the target cluster. important Do not change or delete this path until the replication policy is in force and do not edit or modify the path during the replication life cycle. If you edit, change, or delete the path, replication errors occur or the replication is incomplete.
Source Database	Enter the source database name.
Schedule	Schedule the replication job as necessary. The Hive ACID table replication policy uses the Hive scheduler to schedule the replication policies. You can choose the following schedule options: Every - Choose an interval. The schedule starts at the next exact interval based on the server time. If you want to start it immediately, issue a Run Now operation. For example, if you are in California and the destination cluster is in the Europe (Frankfurt) region, the schedule is set based on the server time, that is, Frankfurt local time. Unix Cron Expression - Enter a valid Unix cron expression. You can generate the Unix cron expression in the following page, and then use it in the Create Hive ACID Table Replication Policy wizard: https://www.freeformatter.com/cron-expression-generator-quartz.html For example, you can use 0 /30 ? * * to schedule the policy to run every 30 minutes, 0 0 /4 ? * to schedule the run for every 4 hours, or 0 0 0 * * ? every day at midnight.
Run as Username	Enter hive.

Click the Resources tab to configure the following options:


Option	Description
Scheduler Pool	(Optional) Enter the name of a resource pool. The value you enter is used by the MapReduce Service when Cloudera Manager runs the MapReduce job for the replication. You can use one of the following property: MapReduce – Fair scheduler: mapred.fairscheduler.pool MapReduce – Capacity scheduler: queue.name YARN – mapreduce.job.queuename note The DistCp job running on the target cluster or a YARN uses the mapreduce.job.queuename property.
Maximum Number of Copy Mappers	Enter the maximum number of simultaneous copy mappers for DistCp. The default value is 20.
Maximum Bandwidth per Copy Mapper	Enter the bandwidth limit for each mapper. Default is 100 MB. The total bandwidth used by the replication policy is equal to Maximum Bandwidth multiplied by Maximum Map Slots. Therefore, you must ensure that the bandwidth and map slots you choose do not impact other tasks or network resources in the target cluster. tip The Throughput field on the Cloudera Manager > Replication Policies page shows the maximum bandwidth set for the replication policy during replication policy creation.

Select the Advanced tab to configure the following options:


Option	Description
Policy Description	(Optional). Enter a brief description for the replication policy.
Overrides	Enter key-value pairs to override parameters for Hive ACID table replication configuration. For example, when you use HA-based clusters, you can enter the relevant key-value pairs. During the replication process, the Replication Manager overrides the key-values pairs. You can also pass the DistCp argument.

Click Save to run the Hive ACID table replication policy.

When a non-recoverable error appears with the FAILED_ADMIN status for a replication job, perform the following steps to fix the error:

Go to the error log path.
Search for the file named _non_recoverable.
Check the error stack that is printed in the _non_recoverable file.
Fix the error.
Delete the _non_recoverable file. For the next replication jobs in the policy to function normally, the _non_recoverable file must be deleted.