Troubleshooting Hive replication
You need to know the problems and solutions, or workarounds, you might encounter during replication.
Problem: In Cloudera Manager, the history of a schedule appears as FAILED and the status shows SKIPPED. SKIPPED runs are listed as FAILED runs.
Solution: FAILED with SKIPPED status might indicate an issue with the dump schedule on the source cluster. When the dump completes after the load starts, there might be no data to load. The first run (bootstrap) of the schedule takes a longer time than the subsequent (incremental) runs. Hence, the Hive query on the target side (load) might fail because the query runs at the same time on the source cluster before the source completes the dumping operation.
Problem: A non-recoverable error appears for a replication job and the status says FAILED_ADMIN. How do you recover a schedule from the FAILED_ADMIN state?
Solution: Perform the following steps to recover a replication schedule from this state:
- Navigate to the error log path.
- Search for the file _non_recoverable.
- Open the file, and look for information about an error that caused the replication failed.
- Fix the error.
- Delete the _non_recoverable file.
The_non_recoverable file from the last replication command execution must be deleted; otherwise your replication attempt will malfunction.
Problem: Notification events are missing in the metastore.
Solution: If notification events are not present in the metastore during replication, the replication might be in a FAILED_ADMIN status. When this occurs, notifications are deleted in the metastore. In this case, the workaround is to start a fresh bootstrap phase of replication, as follows:
- Drop the target database using beeline.
- Remove the dump directory on HDFS for the required policy. The path of _non_recoverable
error file path has the dump directory path.
The replication continues where it stopped.
Problem: The location of a database or table is not intuitive.
Solution: The following use cases show how the default location and custom locations for databases and tables are handled during Hive replication:
- If the source database properties location and managedlocation are set to the default location (<dbname>.db.toLowerCase()), the target database properties location and managedlocation are also set to the default location after replication.
- If the source database properties location and managedlocation are set to custom locations, the target database properties location and managedlocation retain the corresponding custom locations on the target cluster after replication.
By default, the custom location is retained on the target cluster. You can disable this behaviour by configuring the hive.repl.retain.custom.db.locations.on.target policy-level configuration property to false. When you disable this property and run the Hive replication, the replicated database locations on the target cluster are set to the default locations, irrespective of whether the database locations on the source are set to default or custom locations.
- After replication, a replicated managed table inherits the parent’s database managedlocation property irrespective of whether the managedlocation property of the parent’s database is set to the default location or custom location on the source cluster.
- After replication, a replicated external table derives its location from the value of the
hive.repl.replica.external.table.base.dir property and the external table location on
the source cluster.
For example, if an external table ext_tab1 is located at /ext_loc/ext_tab1/ on the source cluster and the hive.repl.replica.external.table.base.dir property is configured as /ext_base1 on the target, the location for ext_tab1 on the target cluster is /ext_base1/ext_loc/ext_tab1.The hive.repl.replica.external.table.base.dir property is derived from the value you set for External Table Base Directory in the Hive replication policy.