Migrating data from CDH to CDP Public Cloud
Follow the following steps to migrate your data and metadata from a CDH cluster to a CDP Public Cloud cluster.
Registering source and target clusters for data migration
Register your source and target clusters for migration.
Performing the data migration
This section helps you understand the steps required to perform the data migration and how to approach the migration workflow.
Running the initial cluster data scan
The Assessment stage includes an initial cluster data scan that must be successfully before you can proceed with the migration.
The initial cluster data scan collects Hive and HDFS data, then runs a tool to detect any problems with the Hive tables.
Click the Play button at the bottom of Initial Cluster Data Scan in the Assessment box to start the cluster scan. When the scan completes, a green checkmark appears next to the scan link and you can view the validation results in the Output tab.

Working with the master table
The master table is used to display and manage data sets based on the source cluster scan. In the master table, you can review potential issues with the Hive tables, create and assign labels that help you migrate related data sets together.
Reviewing the data sets
In the Datasets tab of the Master Table page, you can review all of the Hive tables identified in the initial cluster scan, as well as the associated HDFS locations.
In the SRE column, you can review any potential issues that may arise with a certain table during the migration.

You can click the link in SRE column to get warnings and recommendations for actions to take before you proceed with the migration. The tool being used is the Hive-SRE tool. For more information, see the Git repository.
Using labels
In the Labels tab of the Master Table page, you can create and assign labels that allow you to classify related data sets and migrate them together, creating multiple replication policies. There is no limit to the amount of labels that you can create. These labels belong to the master table and can be reused in subsequent migrations from the same source.
Selecting and submitting the data for replication
Once you have labeled all of the data sets that you want to include in a particular migration, you can use those labels to create the data replication policies that perform the actual data migration. In the background, CMA uses Replication Manager to create replication policies from the labeled data sets that you have created.