Migrating actual Hive data

Although HMS Mirror migrates only Hive metastore (HMS) metadata, not the actual Hive data, the tool generates a script for running the Hadoop DistCp tool. Using DistCp, you migrate actual data from HDP to CDP One.

DistCp is fully documented in HDP to CDP SaaS HDFS Migration. When you run the HMS Mirror tool, you generated a number of distcp files:
The distcp_script.sh contains a template and instructions for running the script to migrate the actual Hive data. The template looks something like this:
hadoop distcp ${DISTCP_OPTS} -f ${HCFS_BASE_DIR}/migrate_db_1_distcp_source.txt s3a://myserver//apps/hive/warehouse/migrate_db.db
In this task, you customize the DistCp command to substitute values specific to your migration for the variables above.
Give the user who is migrating the data permissions to read/write/execute on both the HDP and CDP clusters.
  1. From the information in the distcp_workbook.md file, get the names of source files from the Sources column to migrate from the <dbname>_1_distcp_source.txt
  2. On the source HDP cluster, export the environment variable HCFS_BASE_DIR that represents the path to the source files.
    ${HCFS_BASE_DIR} must be available to the user running DistCp.
  3. Export the DISTCP_OPTS environment variable to customize job settings.
    For example, you might adjust memory settings for large jobs.
  4. Using information from steps 1-3 plus your S3 access key and password, customize the DistCp template.
  5. On the HDP source cluster edge node, run the customized DistCp commands.
    For example:
    hadoop distcp -Dfs.s3a.access.key=AKIAR2YKDFBDE64CJ5XG -Dfs.s3a.secret.key=<mykey> hdfs://ctr-e172-1620330694487-659732-01-000002.hwx.site:8020/apps/hive/warehouse/migrate_db.db/managed_table_1  s3a://myserver//apps/hive/warehouse/migrate_db.db
    hadoop distcp -Dfs.s3a.access.key=AKIAR2YKDFBDE64CJ5XG -Dfs.s3a.secret.key=<mykey> hdfs://ctr-e172-1620330694487-659732-01-000002.hwx.site:8020/apps/hive/warehouse/migrate_db.db/file_format_table  s3a://myserver//apps/hive/warehouse/migrate_db.db
    hadoop distcp -Dfs.s3a.access.key=AKIAR2YKDFBDE64CJ5XG -Dfs.s3a.secret.key=<mykey> hdfs://ctr-e172-1620330694487-659732-01-000002.hwx.site:8020/apps/hive/warehouse/migrate_db.db/external_table_1  s3a:/myserver//apps/hive/warehouse/migrate_db.db
  6. Check the status of the migration in HTML reports generated by DistCP to determine success or failure.
  7. If an error occurs, check $HOME/.hms-mirror/logs/hms-mirror.log.