Preparing to create an HBase replication policy
Before you create HBase replication policies, you must prepare the clusters, register cloud storage in Replication Manager, and verify cluster access.
-
Do the source cluster and target cluster meet the requirements to create an
HBase replication policy?
For more information, see Support Matrix.
-
Is the source CDH cluster or source CDP Private Cloud Base cluster registered
as a classic cluster on the Management Console?
CDH clusters and CDP Private Cloud Base clusters are managed by Cloudera Manager. To enable these on-premises clusters for Replication Manager, you must register them as classic clusters on the Management Console. After registration, you can use them for data migration purposes.
For information about registering an on-premises cluster as a classic cluster, see Add a CDH cluster and Adding a CDP Private Cloud Base cluster.
-
Are the following steps complete on the CDP Private Cloud Base source cluster
or CDH source cluster (these steps are not required for COD sources)?
- Have you installed the HBase replication plugin parcel in the
CDH source clusters?
Applicable for CDH versions 7.2.x that are lower than 7.2.2, versions 7.1.x that are lower than 7.1.5, and for versions lower than 7.x. For more information, see Cloudera Replication Plugin.
- Have you created the /user/hbase folder
for the hbase user in HDFS in the source
cluster?
Applicable for Cloudera Manager versions 7.4.3 or lower.
These commands allow the HBase replication policy to replicate the existing data in the source cluster.
- Have you installed the HBase replication plugin parcel in the
CDH source clusters?
-
Is the required cloud credential that you want to use in the replication policy
registered with the Replication Manager service?
For more information, see Working with Cloud Credentials.
-
Have you assigned the managed identity of source roles, Storage Blob
Data Owner or Storage Blob Data
Contributor, to the destination storage data container and vice
versa for bidirectional replication when you are using COD on Microsoft
Azure?
The roles allow writing a snapshot in the destination cluster container.
- Is the required target cluster (Data Hub or COD) available and healthy?
- Do you have the required cluster access to create or view replication policies?
-
Does DNS resolution work as expected between the source and destination
clusters?
- Is the outgoing SSH port open on the Cloudera Manager host?
- Do you need to replicate data securely? If so, ensure that the SSL/TLS certificate exchange between two Cloudera Manager instances that manage source and target clusters respectively is configured. For more information, see Configuring SSL/TLS certificate exchange between two Cloudera Manager instances.
-
Are the following ports (minimum port configuration) open and available for
Replication Manager:
Table 1. Minimum ports required for HBase replication policies Ports Service Description 2181 and 16020 Destination hosts of the AWS cluster or ADLS cluster (target cluster), and the Cloudera Manager server port on the source cluster Verify whether the ports 16020 for worker security group and 2181 for worker, master, and leader groups are open for connection from the source cluster to the destination cluster on AWS or Azure. This ensures that the source HBase service can communicate with Zookeeper and HBase services on the destination hosts uninterruptedly. For more information, see Ports for HBase replication. 7180 or 7183 Cloudera Manager Admin Console HTTP Open on the source cluster to enable Data lake Cloudera Manager to communicate to the on-premises Cloudera Manager. Connects to destination SDX Data Lake Cloudera Manager. 9000 Cloudera Manager Agent Open on the source and target cluster to retrieve diagnostic and log information. 6000-6049 Cluster Connectivity Manager (CCM) Required for SSL connections to the Control Plane via CCM to communicate with Replication Manager. 80 or 443 Data transfer from secondary node for AWS / ADLS Gen2 Outgoing port. Open on all the HDFS nodes for AWS and ADLS Gen2. 8443 Data Lake cluster Outgoing port. Configure the port on the Data Lake cluster as the outgoing port for CDP Management Console to communicate with Cloudera Manager and Knox. 8032 YARN Resource Manager Open on the source and target cluster to access the YARN ResourceManager. The following network security diagram shows the minimum port configuration required to create replication policies:
Figure 1. Network security diagram for Replication Manager in CDP Public Cloud For more information, see Ports for Replication Manager on CDP Public Cloud.