Although HMS Mirror migrates only Hive metastore (HMS) metadata, not the actual Hive
data, the tool generates a script for running the Hadoop DistCp tool. Using DistCp, you
migrate actual data from HDP to Cloudera.
DistCp is fully documented in HDP to Cloudera SaaS
HDFS Migration. When you run the HMS Mirror tool, you generated a number of distcp
files:
The distcp_script.sh contains a template and instructions for running the
script to migrate the actual Hive data. The template looks something like this:
In this task, you customize the DistCp command to substitute values specific to your
migration for the variables above.Give the user who is migrating the data permissions to read/write/execute on both the HDP and CDP clusters.
From the information in the distcp_workbook.md file, get the names of source
files from the Sources column to migrate from the
<dbname>_1_distcp_source.txt
On the source HDP cluster, export the environment variable HCFS_BASE_DIR that
represents the path to the source files.
${HCFS_BASE_DIR} must be available to the user running DistCp.
Export the DISTCP_OPTS environment variable to customize job settings.
For example, you might adjust memory settings for large jobs.
Using information from steps 1-3 plus your S3 access key and password,
customize the DistCp template.
On the HDP source cluster edge node, run the customized DistCp commands.