Preparing to create an HDFS replication policy
Before you create the HDFS replication policies in Cloudera Replication Manager to replicate HDFS data, register the on-premises cluster (CDH or Cloudera Private Cloud Base) as a classic cluster in Cloudera Management Console, register cloud account credentials in the Replication Manager service, verify cluster access, and configure minimum ports for replication.
-
Do the source cluster and target cluster meet the requirements to create an
HDFS replication policy?
The following image shows a high-level view of the support matrix for HDFS replication policies, you must consult the Support matrix for Cloudera Replication Manager for the complete list of supporting clusters and scenarios:
-
Is the required on-premises cluster (CDH cluster or Cloudera Private Cloud Base cluster) registered as a
classic cluster on the Management Console?
CDH clusters and Cloudera Private Cloud Base clusters are managed by Cloudera Manager. To enable these on-premises clusters for Replication Manager, you must register them as Classic Clusters on the Management Console. After registration, you can use them for data migration purposes.
For information about registering an on-premises cluster as a classic cluster, see Adding a CDH cluster and Adding a Cloudera Private Cloud Base cluster.
-
Is an external account available in the Cloudera Manager instance that has
access to the bucket or container that you are using in the HDFS replication
policy?
For more information, see Role-based credential on AWS, App-based credential on Azure, and Cloudera Manager documentation.
-
Do you have the required cluster access to create replication policies?
Power users, the user who onboarded the source and target clusters, and users with ClassicClusterAdmin or ClassicClusterUser resource roles can create replication policies on clusters for which they have access. For more information, see Understanding account roles and resource roles.
-
Do you have the required cluster access to view the replication policies?
Existing HDFS replication policies are visible to users who have access to the source cluster in the replication policy. A warning appears if you do not have access to the source cluster.
If you can view the policies, you can perform other actions on the policy including policy update and policy delete operations.
-
Is the required cloud credential that you want to use in the replication policy
registered with the Replication Manager service?
For more information, see Working with cloud credentials.
-
Are the following ports open and available for Replication Manager?
Table 1. Minimum ports required for HDFS replication policies Connectivity required Default Port Type Description Data transfer from classic cluster hosts to cloud storage 80 or 443 (TLS) Outbound Outgoing port. All classic cluster nodes must be able to access S3/ADLS Gen2 endpoint. Classic cluster 6000-6049 for CCMv1 443 for CCMv2
Outbound Connecting source classic cluster to the Cloudera Management Console through Cluster Connectivity Manager (CCM). For more information, see Outbound network access for CCM, and CCM overview.
Consider the following best practices while using Cloudera Public Cloud on Microsoft Azure ADLS Gen2 (ABFS):- Ensure that the on-premises cluster (port 443) can access the https://login.microsoftonline.com endpoint. This is because the Hadoop client in the on-premises cluster (CDH/Cloudera Private Cloud Base) connects to the endpoint to acquire the access tokens before it connects to Azure ADLS storage. For more information, see the General Azure guidelines row in the Azure-specific endpoints table.
- Ensure that the steps mentioned in the General Azure guidelines and Azure Data Lake Storage Gen 2 rows in the Azure-specific endpoints table are complete so that the endpoint connects to the target path successfully.
The following system architecture diagram shows the interaction between components during HDFS replication using HDFS replication policies: