Port and network requirements for Replication Manager on CDP Private Cloud Base
Before you create replication policies in Replication Manager, ensure that the network and security requirements for the clusters are complete. You must also ensure that the required ports are open and accessible on the source hosts and CDP Private Cloud Base hosts to allow communication between the source and destination Cloudera Manager servers and the HDFS, Hive, MapReduce, and YARN hosts. Ensure that the ports on the source and target cluster are connected.
Network and security requirements
You must ensure that the networking and security requirements for CDP Private Cloud Base are complete. For example, the cluster hosts must have a working network name resolution system, a correctly formatted /etc/hosts file, and must have properly configured the forward and reverse host resolution through DNS. For more information about the networking and security requirements, see Networking and security requirements for CDP Private Cloud Base.
Services and default port
The following table shows a list of services that Replication Manager requires, their default ports, and a brief description, and then a sample snippet is provided to illustrate the mapping of ports between the source and target clusters to use them in CDP Private Cloud Base Replication Manager:
Service | Default Port | On-premises source hosts | Description |
---|---|---|---|
Cloudera Manager HTTP (Web UI) | 7180 | All Management Nodes (CM*) | Used for control flow. Open on specific source and destination IP address and not on all source IP addresses to communicate to the peer (source) Cloudera Manager. After you configure the source and destination clusters, the destination Cloudera Manager connects to source Cloudera Manager on port 7180/7183 during peering. |
HDFS NameNode | 8020 | All Primary Nodes | Used for data flow by HDFS and Hive/Impala replication to communicate from destination HDFS and MapReduce hosts to source HDFS NameNode(s). |
HDFS DataNode | 50010 / 9866 is used for DataNode HTTP server port. | All Secondary Nodes | Used for data flow by HDFS and Hive/Impala replication to communicate from destination HDFS and MapReduce hosts to source HDFS DataNode(s). |
NameNode WebHDFS | 9870 | Used for data flow for Apache Hadoop HttpFS service to provide HTTP access to HDFS. HttpFS has a REST HTTP API supporting all HDFS filesystem operations (both read and write). For more information, see Using HttpFS. | |
YARN Resource Manager | 8032 | All Primary Nodes | Used for data flow to access the YARN ResourceManager. For more information, see YARN Configuration Properties. |
Hive Metastore | 9083 | All Management Nodes (CM*) | Used for data flow for Hive/Impala replication to query or access Hive Metastore. For more information, see Configure metastore location and HTTP mode. |
Impala Catalog Server | 26000 | All Management Nodes (CM*) | Internal use only for data flow during Hive/Impala replication. The catalog service uses this port to communicate with the Impala daemons. |
Ranger KMS | 9292 | All Primary Nodes | Used for data flow during replication of encrypted data. For more information, see Migrating Keys. |
Kerberos KDC Server and KRB5 services | 88 | All | Used for authentication flow by Replication Manager when Kerberos
authentication is enabled on the clusters. Open the port on all the hosts on the destination cluster. |
*Cloudera Manager |
For information about ports required for Ozone replication policies, see Ports used by Apache Ozone.
Sample snippet to illustrate ports mapping on source and target clusters
Some ports must be open on specific hosts of source and target clusters to facilitate and optimize the performance of Replication Manager. The following sample snippet lists the ports that are required to be open on specific hosts and how to map/connect it to other hosts to use these clusters in replication policies.
On the target cluster:
Target_CM* :7180 --> Source_CM :7180
Target_CM :7183 --> Source_CM :7183
Target_CM :9000 --> Source_agents :9000**
Target_CM :8020 --> Source_NameNodes :8020
Target_CM :50010 --> Source_DataNodes :50010
Target_CM :1004 --> Source_DataNodes :1004
Target_CM :50070 --> Source_NameNodes :50070***
Target_CM :8032 --> Source_ResourceManager :8032
Target_NameNodes :8020 --> Source_NameNodes :8020
Target_NameNodes :50070 --> Source_NameNodes :50070
Target_NameNodes :50010 --> DR DataNodes :50010
Target_NameNodes :1004 --> DR DataNodes :1004
Target_DataNodes :50010 --> DR DataNodes :50010
Target_DataNodes :1004 --> DR DataNodes :1004
Target_ResourceManager :8032 --> Source__ResourceManager :8032
Target_DataeNodes :8020 --> Source_NameNodes :8020
Target_CM :1006 --> Source_DataNodes :1006***
Target_NameNodes :1006 --> Source_DataNodes :1006
Target_DataNodes :1006 --> Source_DataNodes :1006
Target_CM :14000 --> Source_HttpFS :14000
On the source cluster:
Source_CM :7180 --> Target_CM :7180
Source_CM :7183 --> Target_CM :7183
Source_CM :9000 --> Target_agents :9000
Source_CM :8020 --> Target_NameNodes :8020
Source_CM :50010 --> Target_DataNodes :50010
Source_CM :1004 --> Target_DataNodes :1004
Source_CM :50070 --> Target_webHDFS :50070
Source_CM :8032 --> Target_ResourceManager :8032
Source_NameNodes :8020 --> Target_NameNodes :8020
Source_NameNodes :50070 --> Target_NameNodes :50070
Source_NameNodes :50010 --> Target_DataNodes :50010
Source_NameNodes :1004 --> Target_DataNodes :1004
Source_DataNodes :50010 --> Target_DataNodes :50010
Source_DataNodes :1004 --> Target_DataNodes :1004
Source_ResourceManager :8032 --> Target_ResourceManager :8032
Source_DataeNodes :8020 --> Target_NameNodes :8020
Source_CM :1006 --> Target_DataNodes :1006
Source_NameNodes :1006 --> Target_DataNodes :1006
Source_DataNodes :1006 --> Target_DataNodes :1006
Source_CM :14000 --> Target_HttpFS :14000
*Cloudera Manager
**Cloudera Manager agent uses port 9000
***WebHDFS NameNode uses port 50070 and WebHDFS DataNode uses port 1006