Ports for Cloudera Replication Manager
Before you create replication policies, you must ensure that the required ports are open and available for data replication. You can verify the mandatory ports using the Replication Manager network security diagram.
HDFS replication polices
Connectivity required | Default Port | Type | Description |
---|---|---|---|
Data transfer from classic cluster hosts to cloud storage | 80 or 443 (TLS) | Outbound | Outgoing port. All classic cluster nodes must be able to access S3/ADLS Gen2 endpoint. |
Classic cluster | 6000-6049 for CCMv1 443 for CCMv2 |
Outbound | Connecting source classic cluster to the Cloudera Management
Console through Cluster Connectivity Manager (CCM). For more information, see Outbound network access for CCM, and CCM overview. |
Hive replication polices
Connectivity required for | Default Port | Port opened toward | Description |
---|---|---|---|
Data transfer from classic cluster hosts to cloud storage | 80 or 443 (TLS) | Outbound towards cloud provider public hosts (For example, AWS) | Outgoing port. All classic cluster nodes must be able to access S3/ADLS Gen2 endpoint. |
Cloudera Manager Admin Console HTTP | 7180 or 7183 (when TLS enabled) | Inbound towards destination Cloudera Public Cloud Cloudera Manager host | Incoming port. Open on the source cluster to enable the target Cloudera Manager in cloud to communicate to the on-premises Cloudera Manager. |
Classic cluster | 6000-6049 for CCMv1 443 for CCMv2 |
Outbound towards Cloudera Public Cloud Control Plane (CCM and cluster proxy) |
Connecting the source classic cluster to the Cloudera Management Console through Cluster Connectivity Manager (CCM) For more information, see Outbound network access for CCM, and CCM overview. |
HBase replication policies
The following ports must be open and available for Replication Manager for HBase replication policies:
Service | Ports | Ports opened toward | Description |
---|---|---|---|
Destination hosts of the AWS cluster or ADLS cluster (target cluster), and the Cloudera Manager server port on the source cluster | 2181 and 16020 | Outbound in destination Zookeeper and HBase RegionServer host(s) | Verify whether the ports 16020 for worker security group and 2181 for worker, master, and leader groups are open for connection from the source cluster to the destination cluster on AWS or Azure. This ensures that the source HBase service can communicate with Zookeeper and HBase services on the destination hosts uninterruptedly. For more information, see Ports for HBase replication. |
HMaster | 16000 | Outbound in destination HBase Master host(s) | Open the port on the Master Nodes (HBase Master Node and any
back-up HBase Master node). Before you select the Validate Replication option during the first HBase replication policy creation between two specific clusters, you must ensure that the port is open on the target cluster. |
Cloudera Manager Admin Console HTTP | 7180 or 7183 | Inbound towards destination Cloudera Public Cloud Cloudera Manager host | Open on the source cluster to enable Data lake Cloudera Manager to communicate to the on-premises Cloudera Manager. Connects to destination SDX Data Lake Cloudera Manager. |
Cluster Connectivity Manager (CCM) for CCMv1 |
6000-6049 | For Cloudera Public Cloud Control Plane (CCM and Cluster Proxy) | Required for SSL connections to the Control Plane via CCM to communicate with Replication Manager. |
Data transfer from secondary node for AWS / ADLS Gen2 for CCMv2 |
80 or 443 | Outbound towards data transfer from secondary node for AWS / ADLS Gen2 | Outgoing port. Open on all the HDFS nodes for AWS and ADLS Gen2. |
Data Lake cluster | 8443 | On destination Cloudera Public Cloud Cloudera Manager/Knox host.
(applicable when Knox is available on the on-premises source cluster) |
Outgoing port. Configure the port on the Data Lake cluster as the outgoing port for Cloudera Management Console to communicate with Cloudera Manager and Knox. |
Best practices
- Ensure that the on-premises cluster (port 443) can access the https://login.microsoftonline.com endpoint. This is because the Hadoop client in the on-premises cluster (CDH/Cloudera Private Cloud Base) connects to the endpoint to acquire the access tokens before it connects to Azure ADLS storage. For more information, see the General Azure guidelines row in the Azure-specific endpoints table.
- Ensure that the steps mentioned in the General Azure guidelines and Azure Data Lake Storage Gen 2 rows in the Azure-specific endpoints table are complete so that the endpoint connects to the target path successfully.