Validating external table replication
You need to validate external table replication before migration to CDP. You run the external table validation commands on the CDP cluster.
Command Syntax
hive --replMigration -dumpFilePath <path to external table info file> \
[-dirLevelCheck] [-fileLevelCheck] [-verifyOpenFiles] [-verifyChecksum] [-filters] \
[-conf] [-queueSize] [-numThreads]
Options
-
-dumpFilePath: The fully qualified path to the external table info file.
-
-dirLevelCheck: Validate at directory level.
-
-fileLevelCheck: Validate at file level.
-
-verifyOpenFiles: Validate there are no open files on the source path. Requires superuser privileges.
-
-verifyChecksum: Whether the checksum needs to be validated for each file. Cannot be used with -dirLevelCheck. Will fail if the source and target are in different encryption zones or use different checksum algorithms.
-
-filters: Comma separated list of filters, cannot be used along with -dirLevelCheck.
-
-conf: Semi-Colon separated list of additional configurations in key1=value1;key2=value2 format.
-
-queueSize: Queue size for the thread pool executor for table level validation. Default: 200
-
-numThreads: Number of threads for thread pool executor for table level validation. Default: 10
-
-checksumQueueSize: Queue size for the thread pool executor for checksum computation. Default: 200
-
-checksumNumThreads: Number of threads for thread pool executor for checksum computation. Default:5
Running the external table validation commands can take a significant amount of time, and writes to the HDP database can occur. You need to check the checkpoint ID again after validating external table replication to determine if any writes did indeed happen.
To validate external table replication, perform the following steps: