You need to know the prerequisites for migrating HDFS and Hive data from CDH to CDP.
You must meet the following general prerequisites before starting the migration process:
- AWS is your Cloud provider. Migrating to Azure / GCP is currently not supported.
- Classic clusters (CDH/HDP/CDP Private Cloud Base) are on-premises clusters that host the data to be replicated to the CDP deployment. These clusters are on-premises versions of Cloudera Data Platform and must be registered on the Management Console before they can be used for data migration purposes. For more information, see Support Matrix for Replication Manager.
- Validate the CDH source cluster as per cluster requirements.
- CDH clusters must have been created using Cloudera Manager.
- Clusters that are not managed by Cloudera Manager cannot be registered to CDP.
- You must log in as the
adminuser to Cloudera Manager in the CDH cluster.
- Replication user needs to have
hadoop adminprivilege on source cluster.
- On the source cluster which is running CDH, make sure the user we specify
on the replication manager source cluster side has
- Your source cluster supports compatible versions of Cloudera Manager and Cloudera Runtime shown in the following table:
|Source cluster||Earliest supported Cloudera Manager||Earliest supported Cloudera Runtime|
The only required migration tool is Cloudera Replication Manager, which is pre-installed in Cloudera Manager and licensed for use in Data Hub.
Specifically note that:
- Your CDH cluster is registered on the CDP Management Console. For more information, see Adding a CDH cluster.
- Open communication ports between the source CDH cluster and CDP list.
- Set up line-of-sight from the CDH cluster to the AWS endpoint.
- Ensure adequate network connectivity between the CDH cluster and AWS endpoint.
- Create an external account on the CDH cluster that has a non-expiring access credentials and secret key pair.
You must have the license to perform your tasks in Replication Manager. To understand more about Cloudera license requirements, see Managing Licenses.
- Allocate storage on HDFS to create HDFS snapshots that allows administrators to modify the active file-system.
- You must be a part of the super-group on each host on the CDH cluster.