Preparing to create an HBase replication policy

Before you create HBase replication policies, you must prepare the clusters, register cloud storage in Replication Manager, and verify cluster access.

  • Do the source cluster and target cluster meet the requirements to create an HBase replication policy?
    For more information, see Support Matrix.
  • Is the source CDH cluster or source CDP Private Cloud Base cluster registered as a classic cluster on the Management Console?
    CDH clusters and CDP Private Cloud Base clusters are managed by Cloudera Manager. To enable these on-premises clusters for Replication Manager, you must register them as classic clusters on the Management Console. After registration, you can use them for data migration purposes.

    For information about registering an on-premises cluster as a classic cluster, see Add a CDH cluster and Adding a CDP Private Cloud Base cluster.

  • Are the following steps complete on the CDP Private Cloud Base source cluster or CDH source cluster (these steps are not required for COD sources)?
    1. Have you installed the HBase replication plugin parcel in the CDH source clusters?

      Applicable for CDH versions 7.2.x that are lower than 7.2.2, versions 7.1.x that are lower than 7.1.5, and for versions lower than 7.x. For more information, see Cloudera Replication Plugin.

    2. Have you created the /user/hbase folder for the hbase user in HDFS in the source cluster?

      Applicable for Cloudera Manager versions 7.4.3 or lower.

      These commands allow the HBase replication policy to replicate the existing data in the source cluster.

  • Is the required cloud credential that you want to use in the replication policy registered with the Replication Manager service?
    For more information, see Working with Cloud Credentials.
  • Have you assigned the managed identity of source roles, Storage Blob Data Owner or Storage Blob Data Contributor, to the destination storage data container and vice versa for bidirectional replication when you are using COD on Microsoft Azure?
    The roles allow writing a snapshot in the destination cluster container.
  • Is the required target cluster (Data Hub or COD) available and healthy?
  • Do you have the required cluster access to create or view replication policies?
  • Does DNS resolution work as expected between the source and destination clusters?
  • Is the outgoing SSH port open on the Cloudera Manager host?
  • Do you need to replicate data securely? If so, ensure that the SSL/TLS certificate exchange between two Cloudera Manager instances that manage source and target clusters respectively is configured. For more information, see Configuring SSL/TLS certificate exchange between two Cloudera Manager instances.
  • Are the following ports (minimum port configuration) open and available for Replication Manager:
    Table 1. Minimum ports required for HBase replication policies
    Ports Service Description
    2181 and 16020 Destination hosts of the AWS cluster or ADLS cluster (target cluster), and the Cloudera Manager server port on the source cluster Verify whether the ports 16020 for worker security group and 2181 for worker, master, and leader groups are open for connection from the source cluster to the destination cluster on AWS or Azure. This ensures that the source HBase service can communicate with Zookeeper and HBase services on the destination hosts uninterruptedly. For more information, see Ports for HBase replication.
    7180 or 7183 Cloudera Manager Admin Console HTTP Open on the source cluster to enable Data lake Cloudera Manager to communicate to the on-premises Cloudera Manager. Connects to destination SDX Data Lake Cloudera Manager.
    9000 Cloudera Manager Agent Open on the source and target cluster to retrieve diagnostic and log information.
    6000-6049 Cluster Connectivity Manager (CCM) Required for SSL connections to the Control Plane via CCM to communicate with Replication Manager.
    80 or 443 Data transfer from secondary node for AWS / ADLS Gen2 Outgoing port. Open on all the HDFS nodes for AWS and ADLS Gen2.
    8443 Data Lake cluster Outgoing port. Configure the port on the Data Lake cluster as the outgoing port for CDP Management Console to communicate with Cloudera Manager and Knox.
    8032 YARN Resource Manager Open on the source and target cluster to access the YARN ResourceManager.

    The following network security diagram shows the minimum port configuration required to create replication policies:

    Figure 1. Network security diagram for Replication Manager in CDP Public Cloud
    The image shows the network security diagram for Replication Manager in CDP Public Cloud.

    For more information, see Ports for Replication Manager on CDP Public Cloud.

After the clusters and cloud storage requirements are met, you can create an HBase replication policy.