Preparing to create an HDFS replication policy

Before you create the HDFS replication policies to replicate HDFS data, register the on-premises cluster (CDH or CDP Private Cloud Base) as a classic cluster in Management Console, register cloud account credentials in the Replication Manager service, verify cluster access, and configure minimum ports for replication.

  • Do the source cluster and target cluster meet the requirements to create an HDFS replication policy?
  • Is the required on-premises cluster (CDH cluster or CDP Private Cloud Base cluster) registered as a classic cluster on the Management Console?
    CDH clusters and CDP Private Cloud Base clusters are managed by Cloudera Manager. To enable these on-premises clusters for Replication Manager, you must register them as classic clusters on the Management Console. After registration, you can use them for data migration purposes.

    For information about registering an on-premises cluster as a classic cluster, see Adding a CDH cluster and Adding a CDP Private Cloud Base cluster.

  • Is an external account available in the Cloudera Manager instance that has access to the bucket or container that you are using in the HDFS replication policy?
    For more information, see Role-based credential on AWS, App-based credential on Azure, and Cloudera Manager documentation.
  • Do you have the required cluster access to create replication policies?
    Power users, the user who onboarded the source and target clusters, and users with ClassicClusterAdmin or ClassicClusterUser resource roles can create replication policies on clusters for which they have access. For more information, see Understanding account roles and resource roles.
  • Do you have the required cluster access to view the replication policies?
    Existing HDFS replication policies are visible to users who have access to the source cluster in the replication policy. A warning appears if you do not have access to the source cluster.

    If you can view the policies, you can perform other actions on the policy including policy update and policy delete operations.

  • Is the required cloud credential that you want to use in the replication policy registered with the Replication Manager service?
    For more information, see Working with cloud credentials.
  • Do you need to replicate data securely? If so, ensure that the SSL/TLS certificate exchange between two Cloudera Manager instances that manage source and target clusters respectively is configured. For more information, see Configuring SSL/TLS certificate exchange between two Cloudera Manager instances.
  • Are the following ports (minimum port configuration) open and available for Replication Manager:
    Table 1. Minimum ports required for HDFS replication policies
    Port Service Description
    9000 Cloudera Manager Agent Open on the source and target cluster to retrieve diagnostic and log information.
    6000-6049 CCM Required for SSL connections to the Control Plane via Cluster Connectivity Manager (CCM) to communicate with Replication Manager.
    80 or 443 Data transfer from secondary node for AWS / ADLS Gen2 Open on all the HDFS nodes for AWS and ADLS Gen2.
    8032 YARN Resource Manager Open on the source and target cluster to access the YARN ResourceManager.

    The following network security diagram shows the minimum port configuration required to create replication policies:

    Figure 1. Network security diagram for Replication Manager in CDP Public Cloud
    The image shows the network security diagram for Replication Manager in CDP Public Cloud.

    For more information, see Ports for Replication Manager on CDP Public Cloud.

After the clusters and cloud storage requirements are met, you can create an HDFS replication policy.