Preparing clusters for HDFS replication policies

Before you create an HDFS replication policy, you must verify whether Cloudera Private Cloud Data Services Replication Manager supports the clusters for replication, ensure the required ports are open, and if the required Cloudera Private Cloud Base clusters are available as clusters in the Management Console. You can also enable Kerberos authentication for clusters supporting Kerberos.

Complete the following checklist to ensure that the clusters are ready to be used in an HDFS replication policy:
  • Have you added the required source and target clusters on the Management Console > Clusters page?
  • Are snapshots enabled in Cloudera Manager for the source cluster?
  • Are the following ports open and available for Replication Manager?
    Port Service Description
    7180 or 7183 Cloudera Manager Admin Console HTTP Open on the source cluster to enable source Cloudera Manager to communicate with the target Cloudera Manager.
    80 or 443 Data transfer from secondary node for AWS / ADLS Gen2 Outgoing port. Open on all the HDFS nodes for AWS and ADLS Gen2.
  • Do the clusters support Kerberos?
    1. Enable Kerberos authentication and configure additional settings on the clusters to enable replication. Perform the following steps to accomplish this task:
      1. Enable Kerberos authentication. For more information, see Enabling Kerberos authentication for Cloudera.
      2. Enable replication between clusters using Kerberos authentication. For more information, see Configuring kerberized clusters for replication.
    2. Add the destination principal as a proxy user on the source cluster if you are using different Kerberos principals for the source and destination clusters.

      For example, if you are using the hdfssrc principal on the source cluster and the hdfsdest principal on the destination cluster, add the following key-value pairs to the HDFS service Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property on the source cluster and then restart the HDFS service:

      • Name = hadoop.proxyuser.hdfsdest.groups; Value = *

      • Name = hadoop.proxyuser.hdfsdest.hosts; Value = *

  • Do you need to replicate data securely? If so, ensure that the SSL/TLS certificate exchange between two Cloudera Manager instances that manage source and target clusters respectively is configured. For more information, see Configuring SSL/TLS certificate exchange between two Cloudera Manager instances.
Cloudera Manager provides data analytics that you can download and use to diagnose HDFS replication performance.