Manage and monitor Hive replication policies

After you create a Hive replication policy in Cloudera Replication Manager, you can perform and monitor various tasks related to the replication policy. You can view the job progress and replication logs. You can edit the advanced options to optimize a job run. You can suspend a job and also activate a suspended job. You can edit the replication policy as necessary.

On the Replication Policies page, you can perform the following actions and tasks on a replication policy and its jobs.

When you click Actions for a Hive replication policy, the following actions appear:


Action	Description
Edit*	Change the replication policy options as required for non-expired policies that are in active or suspended state. Based on the schedule you choose, the replication policy replicates data. You can edit the replication policies to better align with changing requirements. For example, you might want to change the frequency of a policy depending on the data size and importance of the data being replicated. note A replication policy is associated with a cluster or a cluster pair, therefore you cannot change the clusters in the policy. Optionally, expand a replication policy on the Replication Policies page to edit the replication policy options which include frequency (start time cannot be modified if the policy has already started), queue name, maximum bandwidth, and maximum map slots. tip To optimize the replication policy performance, you can configure the queue name, maximum bandwidth, and maximum map slots as necessary.
Delete	Deletes the replication policy permanently.
Suspend	Suspends a running replication policy. Activate the replication policy, if required.
View Log	Download, copy, or open the log. The log shows a brief output of the stdout and stderr logs of a single step of the latest replication policy job run. You can also view the current job status in the Replication Manager > Overview > Issues & Updates > Job Status column. If the job failed, click Failed to view the log details about the job.
View Command Details	Opens the latest Hive replication policy job page. The steps and substeps appear in a tree view. The failed steps are expanded by default, showing the last 15 lines of the log. You can also view the command details for a Hive replication policy on the Overview > Issues & Updates panel. tip To view the complete log for all the jobs, go to the target cluster Cloudera Manager > Running Commands page.
Collect diagnostic bundle	Generates a diagnostic bundle for the replication policy. You can download the bundle as a ZIP file to your machine. Ensure that you are logged into the Cloudera Manager instances for both the source and target clusters before you download the bundle in Replication Manager.
* To view and use the replication policies with an empty name in Replication Manager, you must understand the following implementation: If the API version is lower than 51, an existing replication policy with an empty name can be used and updated. However, if you edit the replication policy and provide a name for the replication policy in versions higher or equal to 51, you must ensure that the name conforms to the validation rules. If the API version is higher or equal to 51, it is mandatory that you provide a unique name to the replication policy to continue using it. This is because API version 51 and higher enforces the validation rules on all the replication policies. To pass the replication policy name validation, you must ensure that the replication policy name is unique. The name can contain letters, numbers, and the _ / - characters. You must also ensure that it does not contain the characters % . ; \ nor any character that is not ASCII printable, which includes the ASCII characters less than 32 and the ASCII characters that are greater than or equal to 127.

When you expand the policy details, the Job History panel appears.

You can view the following details on the panel:

Previous jobs, current job, and one future scheduled job if any.

Job details which include:


Job details	Description
Started	Timestamp when the job started.
Ended	Timestamp when the job ended.
Duration	Time taken to complete the job.
Tables	Number of imported or exported tables.
Progress	Current status of a running job.
Expected	Remaining number of files and bytes expected to be copied for a running job.
Copied	Number of files and bytes copied for a running job and completed job.
Failed	Number of files and bytes that failed to be copied for a completed job.
Deleted	Number of files deleted for a completed job.
Skipped	Remaining number of files and bytes skipped from copying for a running job and complete job.

Click Actions to:
- Abort the job.
- Re-run an aborted or failed job.
- View Log for the job. You can download, copy, or open it to track the job and to troubleshoot any issues for the job.

When you click a job on the Job History panel, the following tabs appear:


Tab	Description
General	Shows the following job details: Started at timestamp Duration taken to complete the job HDFS Replication Report to download the job statistics in CSV format Hive Replication Report to download the job statistics in CSV format Hive Export/Import is the number of external tables exported or imported using Hive replication. Number of Errors encountered during the replication job. Impala UDFs is the number of tables exported or imported using Impala. Job status Message.
Command Details	Shows the details about the commands that ran on the source Cloudera Manager for the job, along with the timestamp.
Setup Error	Shows the stack trace for the commands that ran on the source Cloudera Manager for the failed job.

You can download the following CSV reports from the General > HDFS Replication Report field to track the replication jobs and to troubleshoot issues:


Report	Description
Listing	Lists all the files and directories copied during the replication job.
Status	Shows the complete status report of each file as: an Error occurred and the file was not copied. a Deleted file. an up-to-date file for which the replication was Skipped.
Error Status	Status report of all the copied files with errors. Each file shows the status, path, and message for the copied files with errors.
Skipped Status	Status report of all skipped files. Each file lists the status, path, and message for the databases and tables that were skipped.
Deleted Status	Status report of all deleted files. Each file lists the status, path, and message for the databases and tables that were deleted.
Performance	Summary report about the performance of the running replication job which includes the last performance sample for each mapper that is working on the replication job.
Full Performance	Performance report of the job which includes the samples taken for all mappers during the replication job.

You can download the following CSV reports from the General > Hive Replication Report field to track the replication jobs and to troubleshoot issues:


Report	Description
Hive Result	List of replicated tables.
Hive Performance	Performance report for Hive replication.

Manage and monitor Hive replication policies

We want your opinion

How can we improve this page?