Migration prerequisites
You need to know the prerequisites for migrating HDFS and Hive data from CDH to Cloudera.
You must meet the following general prerequisites before starting the migration process:
- AWS is your Cloud provider. Migrating to Azure / GCP is currently not supported.
- Classic clusters (CDH/HDP/Cloudera Private Cloud Base) are on-premises clusters that host the data to be replicated to the Cloudera deployment. These clusters are on-premises versions of Cloudera and must be registered on the Management Console before they can be used for data migration purposes. For more information, see Support Matrix for Cloudera Replication Manager.
- Validate the CDH source cluster as per cluster requirements.
- CDH clusters must have been created using Cloudera Manager.
- Clusters that are not managed by Cloudera Manager cannot be registered to Cloudera.
- You must log in as the
admin
user to Cloudera Manager in the CDH cluster. - Replication user needs to have
hadoop admin
privilege on source cluster. - On the source cluster which is running CDH, make sure the user we specify
on the replication manager source cluster side has
hadoop admin
privileges. - Your source cluster supports compatible versions of Cloudera Manager and Cloudera Runtime shown in the following table:
Source cluster | Earliest supported Cloudera Manager | Earliest supported Cloudera Runtime |
---|---|---|
CDH 5 | 6.3.0 | 5.1.0 |
CDH 6 | 6.3.0 | 6.1.0 |
CDH 6 | 7.3.1 | 6.3.3 |
The only required migration tool is Cloudera Replication Manager, which is pre-installed in Cloudera Manager and licensed for use in Cloudera Data Hub.
Specifically note that:
- Your CDH cluster is registered on the CDP Management Console. For more information, see Adding a CDH cluster.
- Open communication ports between the source CDH cluster and Cloudera list.
- Set up line-of-sight from the CDH cluster to the AWS endpoint.
- Ensure adequate network connectivity between the CDH cluster and AWS endpoint.
- Create an external account on the CDH cluster that has a non-expiring access credentials and secret key pair.
-
You must have the license to perform your tasks in Cloudera Replication Manager. To understand more about Cloudera license requirements, see Managing Licenses.
- Allocate storage on HDFS to create HDFS snapshots that allows administrators to modify the active file-system.
- You must be a part of the super-group on each host on the CDH cluster.