Migrating workloads from CDH to CDP Public Cloud
Follow the following steps to migrate your workloads from a CDH cluster to a CDP Public Cloud cluster.
Registering source and target clusters for workload migration
Register your source and target clusters for migration.
- Click New Migration on the left navigation pane.
- Select Cloudera Distributed Hadoop 6 from the drop-down menu.
- Register the source CDH cluster by providing the Cloudera Manager URL of the CDH cluster, the Cloudera Manager admin user and Cloudera Manager admin password.
Click Connect. If the connection is successful, click
Select Migrating Workload to CDP Public Cloud. Click
A page for connecting to the CDP Control Plane is displayed
Enter the Control Plane URL, admin user and password. Click
A green checkmark appears upon a successful connection to the Control Plane.
Select the target cluster and click Scan.
- Select one of the radio buttons: enter the SSH user and port, or click Choose File to upload the SSH key for the source cluster nodes.
Provide the S3 bucket access key, S3 bucket secret key, and the Hive query
The following HDFS paths are supported:
- With default namespace:
- With specified namespace:
- With namenode address:
- With default namespace:
The Overview page appears.
The Migration Creation status page appears. When all of the migration creation steps are completed, click on the Go To Migrations link. In-progress migrations appear as blue tiles on the Migrations page.
Performing the workload migration
This section helps you understand the steps required to perform the workload migration and how to approach the migration workflow.
Running the initial cluster workload scan
The Assessment stage includes an initial cluster data scan that must be successfully before you can proceed with the migration.
The initial cluster data scan collects Hive and HDFS data, then runs a tool to detect any problems with the Hive tables.
Click the Play button at the bottom of Initial Cluster Data Scan in the Assessment box to start the cluster scan. When the scan completes, a green checkmark appears next to the scan link and you can view the validation results in the Output tab.
This step collects the source files and analyzes them with the Hive3 parser tool. The results are also displayed in the Master Table under the Sources tab.
Working with the master table
The master table is used to display and manage data sets based on the source cluster scan. In the master table, you can review potential issues with the Hive tables and create and assign labels that help you migrate related data sets together.
Reviewing the data sets
In the Sources tab of the Master Table page, you can review all of the Hive tables identified in the initial cluster scan and the potential problems that may arise during the migration.
Click the Eye icon to review potential issues identified by the Hive3
parser. You can click the Download button to export these issues as a
.csv file, if needed. Alternatively, you can click the
Edit icon within a row to manually edit the Hive SQL based on the
recommendations given by the Hive3 parser tool.