Migrating workloads from CDH to CDP Public Cloud

Follow the following steps to migrate your workloads from a CDH cluster to a CDP Public Cloud cluster.

Registering source and target clusters for workload migration

Register your source and target clusters for migration.

  1. Click New Migration on the left navigation pane.
  2. Select Cloudera Distributed Hadoop 6 from the drop-down menu.
  3. Register the source CDH cluster by providing the Cloudera Manager URL of the CDH cluster, the Cloudera Manager admin user and Cloudera Manager admin password.
  4. Click Connect. If the connection is successful, click Next.
  5. Select Migrating Workload to CDP Public Cloud. Click Next.

    A page for connecting to the CDP Control Plane is displayed

  6. Enter the Control Plane URL, admin user and password. Click Connect.

    A green checkmark appears upon a successful connection to the Control Plane.

  7. Select the target cluster and click Scan.
  8. Select one of the radio buttons: enter the SSH user and port, or click Choose File to upload the SSH key for the source cluster nodes.
  9. Provide the S3 bucket access key, S3 bucket secret key, and the Hive query parser input.
    The following HDFS paths are supported:
    • With default namespace: hdfs:///dir/, hdfs:///dir/file
    • With specified namespace: hdfs://namespace1/dir, hdfs://namespace1/dir/file
    • With namenode address: hdfs://nameNodeHost:port:/dir, hdfs://nameNodeHost:port:/dir/file
    The following local files format are supported:
    • nodeFQDN:/your/local/dir
    • nodeFQDN:/your/local/dir/sqlFile
  10. Click Next.
    The Overview page appears.
  11. Click Create.
    The Migration Creation status page appears. When all of the migration creation steps are completed, click on the Go To Migrations link. In-progress migrations appear as blue tiles on the Migrations page.

Performing the workload migration

This section helps you understand the steps required to perform the workload migration and how to approach the migration workflow.

Running the initial cluster workload scan

The Assessment stage includes an initial cluster data scan that must be successfully before you can proceed with the migration.

The initial cluster data scan collects Hive and HDFS data, then runs a tool to detect any problems with the Hive tables.

Click the Play button at the bottom of Initial Cluster Data Scan in the Assessment box to start the cluster scan. When the scan completes, a green checkmark appears next to the scan link and you can view the validation results in the Output tab.

This step collects the source files and analyzes them with the Hive3 parser tool. The results are also displayed in the Master Table under the Sources tab.

Working with the master table

The master table is used to display and manage data sets based on the source cluster scan. In the master table, you can review potential issues with the Hive tables and create and assign labels that help you migrate related data sets together.

Reviewing the data sets

In the Sources tab of the Master Table page, you can review all of the Hive tables identified in the initial cluster scan and the potential problems that may arise during the migration.

Click the Eye icon to review potential issues identified by the Hive3 parser. You can click the Download button to export these issues as a .csv file, if needed. Alternatively, you can click the Edit icon within a row to manually edit the Hive SQL based on the recommendations given by the Hive3 parser tool.

When editing the SQL statement directly, the following editor is displayed where the statements can be corrected and saved.
As manual steps are required for the Hive Query Refactoring, you can access the corresponding steps using the documentation linked to the Hive Query Refactoring page.