Port and network requirements for Replication Manager on CDP Private Cloud Base

Before you create replication policies in Replication Manager, ensure that the network and security requirements for the clusters are complete. You must also ensure that the required ports are open and accessible on the source hosts and CDP Private Cloud Base hosts to allow communication between the source and destination Cloudera Manager servers and the HDFS, Hive, MapReduce, and YARN hosts. Ensure that the ports on the source and target cluster are connected.

Network and security requirements

You must ensure that the networking and security requirements for CDP Private Cloud Base are complete. For example, the cluster hosts must have a working network name resolution system, a correctly formatted /etc/hosts file, and must have properly configured the forward and reverse host resolution through DNS. For more information about the networking and security requirements, see Networking and security requirements for CDP Private Cloud Base.

Services and default port

The following table shows a list of services that Replication Manager requires, their default ports, and a brief description, and then a sample snippet is provided to illustrate the mapping of ports between the source and target clusters to use them in CDP Private Cloud Base Replication Manager:

Service Default Port On-premises source hosts Description
Cloudera Manager HTTP (Web UI) 7180 All Management Nodes (CM*) Used for control flow. Open on specific source and destination IP address and not on all source IP addresses to communicate to the peer (source) Cloudera Manager. After you configure the source and destination clusters, the destination Cloudera Manager connects to source Cloudera Manager on port 7180/7183 during peering.
HDFS NameNode 8020 All Primary Nodes Used for data flow by HDFS and Hive/Impala replication to communicate from destination HDFS and MapReduce hosts to source HDFS NameNode(s).
HDFS DataNode 50010 / 9866 is used for DataNode HTTP server port. All Secondary Nodes Used for data flow by HDFS and Hive/Impala replication to communicate from destination HDFS and MapReduce hosts to source HDFS DataNode(s).
NameNode WebHDFS 9870 Used for data flow for Apache Hadoop HttpFS service to provide HTTP access to HDFS. HttpFS has a REST HTTP API supporting all HDFS filesystem operations (both read and write). For more information, see Using HttpFS.
YARN Resource Manager 8032 All Primary Nodes Used for data flow to access the YARN ResourceManager. For more information, see YARN Configuration Properties.
Hive Metastore 9083 All Management Nodes (CM*) Used for data flow for Hive/Impala replication to query or access Hive Metastore. For more information, see Configure metastore location and HTTP mode.
Impala Catalog Server 26000 All Management Nodes (CM*) Internal use only for data flow during Hive/Impala replication. The catalog service uses this port to communicate with the Impala daemons.
Ranger KMS 9292 All Primary Nodes Used for data flow during replication of encrypted data. For more information, see Migrating Keys.
Kerberos KDC Server and KRB5 services 88 All Used for authentication flow by Replication Manager when Kerberos authentication is enabled on the clusters.

Open the port on all the hosts on the destination cluster.

*Cloudera Manager

Sample snippet to illustrate ports mapping on source and target clusters

Some ports must be open on specific hosts of source and target clusters to facilitate and optimize the performance of Replication Manager. The following sample snippet lists the ports that are required to be open on specific hosts and how to map/connect it to other hosts to use these clusters in replication policies.

On the target cluster:

Target_CM* :7180 --> Source_CM :7180
Target_CM :7183 --> Source_CM :7183
Target_CM :9000 --> Source_agents :9000**
Target_CM :8020 --> Source_NameNodes :8020
Target_CM :50010 --> Source_DataNodes :50010
Target_CM :1004 --> Source_DataNodes :1004
Target_CM :50070 --> Source_NameNodes :50070*** 
Target_CM :8032 --> Source_ResourceManager :8032
Target_NameNodes :8020 --> Source_NameNodes :8020
Target_NameNodes :50070 --> Source_NameNodes :50070
Target_NameNodes :50010 --> DR DataNodes :50010
Target_NameNodes :1004 --> DR DataNodes :1004
Target_DataNodes :50010 --> DR DataNodes :50010
Target_DataNodes :1004 --> DR DataNodes :1004
Target_ResourceManager :8032 --> Source__ResourceManager :8032
Target_DataeNodes :8020 --> Source_NameNodes :8020
Target_CM :1006 --> Source_DataNodes :1006***
Target_NameNodes :1006 --> Source_DataNodes :1006
Target_DataNodes :1006 --> Source_DataNodes :1006
Target_CM :14000 --> Source_HttpFS :14000

On the source cluster:

Source_CM :7180 --> Target_CM :7180
Source_CM :7183 --> Target_CM :7183
Source_CM :9000 --> Target_agents :9000
Source_CM :8020 --> Target_NameNodes :8020
Source_CM :50010 --> Target_DataNodes :50010
Source_CM :1004 --> Target_DataNodes :1004
Source_CM :50070 --> Target_webHDFS :50070
Source_CM :8032 --> Target_ResourceManager :8032
Source_NameNodes :8020 --> Target_NameNodes :8020
Source_NameNodes :50070 --> Target_NameNodes :50070
Source_NameNodes :50010 --> Target_DataNodes :50010
Source_NameNodes :1004 --> Target_DataNodes :1004
Source_DataNodes :50010 --> Target_DataNodes :50010
Source_DataNodes :1004 --> Target_DataNodes :1004
Source_ResourceManager :8032 --> Target_ResourceManager :8032
Source_DataeNodes :8020 --> Target_NameNodes :8020
Source_CM :1006 --> Target_DataNodes :1006
Source_NameNodes :1006 --> Target_DataNodes :1006
Source_DataNodes :1006 --> Target_DataNodes :1006
Source_CM :14000 --> Target_HttpFS :14000


*Cloudera Manager
**Cloudera Manager agent uses port 9000
***WebHDFS NameNode uses port 50070 and WebHDFS DataNode uses port 1006