Migrating Hive Workloads to Cloudera Private Cloud

Troubleshooting Hive replication using REPL

You need to know how to recover from the FAILED_ADMIN state that stops the replication process.

Problem: A non-recoverable error appears for a replication job and the status says FAILED_ADMIN. How do you recover a schedule from the FAILED_ADMIN state?

Solution: Perform the following steps to recover a replication schedule from this state:

Navigate to the error log path.
Search for the file _non_recoverable.
Open the file, and look for information about an error that caused the replication failed.
Fix the error.
Delete the _non_recoverable file.
The_non_recoverable file from the last replication command execution must be deleted; otherwise your replication attempt will malfunction.

Problem: Notification events are missing in the metastore.

Solution: If notification events are not present in the metastore during replication, the replication might be in a FAILED_ADMIN status. When this occurs, notifications are deleted in the metastore. In this case, the workaround is to start a fresh bootstrap phase of replication, as follows:

Drop the target database using beeline.
Remove the dump directory on HDFS for the required policy. The path of _non_recoverable error file path has the dump directory path.
The replication continues where it stopped.