Reviewing prerequisites before migration
Before migrating from CDH 6 to CDP Public Cloud, review the list of prerequisites that are required for the migration process.
- For data and metadata migration, you need a Data Lake cluster already created in a CDP Public Cloud environment. To create a Data Lake cluster, you can follow the process in Registering an AWS environment.
- For a Hive workload migration, you need a Data Engineering Data Hub already created in a CDP Public Cloud environment. To create a Data Engineering Data Hub cluster, you can follow the process in Creating a cluster on AWS.
- You must use the Cluster Connectivity Manager to manually register the source CDH cluster as a classic cluster in the CDP Control Plane, following the process documented here: Adding a CDH cluster (CCMv2)
- Information to gather before you begin the migration:
- For the source CDH cluster: The CM URL; Admin username and password; SSH user and port or private key to source nodes
- For the target CDP cluster/environment: CDP Control Plane URL; Admin username and password; SSH user and port or private key
- In S3: S3 bucket access key and S3 bucket secret key; S3 credential name. Potentially: S3 bucket base path for HDFS files; S3 bucket path for Hive external tables (these paths should auto-fill from the selected target cluster, but can be changed if needed)
- The CM node of the source CDH cluster must have Python 3.9.8 or higher installed.
- Redaction needs to be off in Cloudera Manager (see how to do it here).