Ports for Replication Manager on CDP Public Cloud

Before you create replication policies, you must ensure that the required ports are open and available for data replication. You can verify the mandatory ports using the Replication Manager network security diagram.

HDFS replication polices

The following ports must be open and available for Replication Manager for HDFS replication policies:
Table 1. Minimum ports required for HDFS replication policies
Connectivity required Default Port Type Description
Data transfer from classic cluster hosts to cloud storage 80 or 443 (TLS) Outbound Outgoing port. All classic cluster nodes must be able to access S3/ADLS Gen2 endpoint.
Classic cluster 6000-6049 for CCMv1

443 for CCMv2

Outbound Connecting source classic cluster to the CDP Management Console through Cluster Connectivity Manager (CCM).

For more information, see Outbound network access for CCM, and CCM overview.

The following system architecture diagram shows the interaction between components during HDFS replication using HDFS replication policies:
Figure 1. System architecture diagram for HDFS replication in CDP Public Cloud Replication Manager
The image shows the system architecture diagram for HDFS replication policies in CDP Public Cloud Replication Manager.

Hive replication polices

The following ports must be open and available for Replication Manager for Hive replication policies:
Table 2. Minimum ports required for Hive replication policies
Connectivity required for Default Port Port opened toward Description
Data transfer from classic cluster hosts to cloud storage 80 or 443 (TLS) Outbound towards cloud provider public hosts (For example, AWS) Outgoing port. All classic cluster nodes must be able to access S3/ADLS Gen2 endpoint.
Cloudera Manager Admin Console HTTP 7180 or 7183 (when TLS enabled) Inbound towards destination CDP Public Cloud Cloudera Manager host Incoming port. Open on the source cluster to enable the target Cloudera Manager in cloud to communicate to the on-premises Cloudera Manager.
Classic cluster 6000-6049 for CCMv1

443 for CCMv2

Outbound towards CDP Public Cloud Control Plane (CCM and cluster proxy)

Connecting the source classic cluster to the CDP Management Console through Cluster Connectivity Manager (CCM)

For more information, see Outbound network access for CCM, and CCM overview.

The following system architecture diagram shows the interaction between components during Hive replication using Hive replication policies:
Figure 2. System architecture diagram for Hive replication in CDP Public Cloud Replication Manager
The image shows the system architecture diagram for Hive replication policies in CDP Public Cloud Replication Manager.

HBase replication policies

The following ports must be open and available for Replication Manager for HBase replication policies:

Table 3. Minimum ports required for HBase replication policies
Service Ports Ports opened toward Description
Destination hosts of the AWS cluster or ADLS cluster (target cluster), and the Cloudera Manager server port on the source cluster 2181 and 16020 Outbound in destination Zookeeper and HBase RegionServer host(s) Verify whether the ports 16020 for worker security group and 2181 for worker, master, and leader groups are open for connection from the source cluster to the destination cluster on AWS or Azure. This ensures that the source HBase service can communicate with Zookeeper and HBase services on the destination hosts uninterruptedly. For more information, see Ports for HBase replication.
HMaster 16000 Outbound in destination HBase Master host(s) Open the port on the Master Nodes (HBase Master Node and any back-up HBase Master node).

Before you select the Validate Replication option during the first HBase replication policy creation between two specific clusters, you must ensure that the port is open on the target cluster.

Cloudera Manager Admin Console HTTP 7180 or 7183 Inbound towards destination CDP Public Cloud Cloudera Manager host Open on the source cluster to enable Data lake Cloudera Manager to communicate to the on-premises Cloudera Manager. Connects to destination SDX Data Lake Cloudera Manager.
Cluster Connectivity Manager (CCM)

for CCMv1

6000-6049 For CDP Public Cloud Control Plane (CCM and Cluster Proxy) Required for SSL connections to the Control Plane via CCM to communicate with Replication Manager.
Data transfer from secondary node for AWS / ADLS Gen2

for CCMv2

80 or 443 Outbound towards data transfer from secondary node for AWS / ADLS Gen2 Outgoing port. Open on all the HDFS nodes for AWS and ADLS Gen2.
Data Lake cluster 8443 On destination CDP Public Cloud Cloudera Manager/Knox host.

(applicable when Knox is available on the on-premises source cluster)

Outgoing port. Configure the port on the Data Lake cluster as the outgoing port for CDP Management Console to communicate with Cloudera Manager and Knox.
Figure 3. System architecture diagram for HBase replication in CDP Public Cloud Replication Manager
The image shows the system architecture diagram for HBase replication policies in CDP Public Cloud Replication Manager.

Best practices

Consider the following best practices while using CDP Public Cloud on Microsoft Azure ADLS Gen2 (ABFS):
  • Ensure that the on-premises cluster (port 443) can access the https://login.microsoftonline.com endpoint. This is because the Hadoop client in the on-premises cluster (CDH/CDP Private Cloud Base) connects to the endpoint to acquire the access tokens before it connects to Azure ADLS storage. For more information, see the General Azure guidelines row in the Azure-specific endpoints table.
  • Ensure that the steps mentioned in the General Azure guidelines and Azure Data Lake Storage Gen 2 rows in the Azure-specific endpoints table are complete so that the endpoint connects to the target path successfully.