Creating HBase replication policy

You can replicate HBase data from a CDH cluster or Cloudera Operational Database (COD) cluster to another COD cluster, and replicate HBase data from a CDP Private Cloud Base cluster or CDH cluster to a Data Lake cluster.

On the AWS cluster and ADLS cluster, make sure that the ports 16020 for worker security group and 2181 for worker, master, and leader groups are open for connection from the source cluster to the destination cluster to ensure that the source HBase service can reach Zookeeper and HBase services on the destination hosts. For more information, see Ports for HBase replication.

You must also ensure that the DNS resolution works as expected between the source and destination clusters.

Perform the following steps on the CDP Private Cloud Base source cluster or CDH source cluster (these steps are not required for COD sources):

  1. (Applicable for CDH versions 7.2.x lower than 7.2.2, versions 7.1.x lower than 7.1.5, and for versions lower than 7.x). Install the HBase replication plugin parcel. For information about the parcel, contact Cloudera Support.

  2. (Applicable only for Cloudera Manager versions 7.4.3 or lower). Create the /user/hbase folder for the hbase user in HDFS in the source cluster. To create the folder, run the following commands:

    sudo -u hdfs hdfs dfs -mkdir /user/hbase
    sudo -u hdfs hdfs dfs -chown hbase:hbase /user/hbase

These commands allow the HBase replication policy to replicate the existing data in the source cluster.

  1. Open the Replication Manager service in Cloudera Data Platform web interface.
  2. Click Replication Policies.
  3. Click Create Policy.
  4. On the Create Replication Policy wizard, enter a name for the replication policy on the General page.
  5. Optionally, add a description for the policy.
  6. Choose HBase.
    The following sample image shows the General page in the Create Replication Policy wizard:
    The image shows the General page in the Create Replication Policy wizard. Choose HBase option to continue creating a HBase replication policy.
  7. Click Next.
  8. On the Select Source page, choose the values for the following options:
    1. Source Cluster or Database. Choose a source cluster.
    2. Source Tables. Enter a table name that you want to replicate. Click the Add icon to add more table names.
    3. Perform Initial Snapshot. Select the option to replicate existing data.
    4. Cloud Credential. This option appears when you choose a CDP Private Cloud Base cluster or CDH cluster as the source cluster and you choose Perform Initial Snapshot.
      Click Add Cloud Credential. In the Add Cloud Credential dialog box, enter a unique name for the cloud credential. Choose one of the following cloud storage types and enter the required options:
      • S3 - Choose an authentication type, enter an access key and secret key, and click Validate to validate the credentials.
      • ADLS - Enter the client ID, tenant ID, and secret key.

      Click Save.

    The following sample image shows the Select Source page in the Create Replication Policy wizard:
    The image shows the Select Source page in the Create Replication Policy wizard. You can choose a source cluster, source tables to replicate, choose the Perform Initial Snapshot option to replicate existing data, and add cloud credentials.
  9. Click Next.
  10. On the Select Destination page, choose the values for the following options:
    1. Destination Data Hub or COD. Choose a Data Lake cluster or COD.
    2. Optional: Choose Set HBase Replication Machine User, and enter the username and password. Ensure that you enter the correct password for an existing user because if the password is incorrect, the data is not replicated even though the policy is created successfully.
      Based on the username and password that you enter, one of the following possible scenarios is implemented by Replication Manager:
      • If Set HBase Replication Machine User is not selected, an HBase replication machine user is created automatically with an auto-generated username.

      • If Set HBase Replication Machine User is selected and Create User If Does Not Exist is not selected, ensure that the username you enter exists in the CDP User Management System (UMS), otherwise an error message appears.

      • If Set HBase Replication Machine User is selected and Create User If Does Not Exist is selected and the username does not exist in UMS, the username is created.

    3. Optional: Click Sync Replication User. Replication Manager validates the existing username with the UMS and synchronizes the new username and password to the destination cluster’s environment (and to the source’s as well if the source is COD).

    The following sample image shows the Select Destination page in the Create Replication Policy wizard when you choose when you choose a CDH or CDP Private Cloud Base source cluster:

    The image shows the Select Destination page in the Create Replication Policy wizard. You can choose the destination cluster here.

    The following sample image shows the Select Destination page in the Create Replication Policy wizard when you choose a COD source cluster:

    The image shows the Select Destination page in the Create Replication Policy wizard. You can choose the destination cluster here.
  11. Click Next.
  12. On the Initial Snapshot Settings page, configure the following options for the source cluster:
    1. YARN Queue Name - If you are using Capacity Scheduler queues to limit resource consumption, enter the name of the YARN queue for the cluster to which the replication job is submitted. The default value for this field is default.
    2. Maximum Maps Slots - Use this option to set the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.
    The following sample image shows the Initial Snapshot Settings page in the Create Replication Policy wizard:
    The image shows the Initial Snapshot Settings page in the Create Replication Policy wizard. You can retain the default values for YARN queue name and maximum maps slots or enter a new value, as required.
  13. Click Create.
  14. If you chose a CDH cluster or CDP Private Cloud Base cluster as a source cluster and a Data Lake or COD cluster as the destination cluster, you must perform the following steps on the source cluster:
    • Restart the HBase service when the policy status on the Replication Policies page shows Manual restart (src) / restarting (dest) or Waiting for service restart on source.
    • After the service restart is complete, click Continue setup for the replication policy on the Replication Policies page.
    The following image shows the Continue setup option for the HBase replication policy on the Replication Policies page:
    The image shows the Continue setup option for the HBase replication policy on the Replication Policies page.
    After you create the first replication policy between a source cluster and target cluster (policy that is in setup/service restart state), Replication Manager creates and runs two schedules or jobs. The first schedule shows the service configuration and service restart progress, and the second schedule shows the policy creation progress. Subsequent replication policies between the same source cluster and target cluster creates only one job. Replication Manager restarts the HBase services on both the clusters if they are COD clusters.

To verify whether a replication policy is running, you can either click the replication policy on the Replication Policies page, or click Running Commands in Cloudera Manager.

To view the job progress of a HBase replication policy, click the replication policy on the Replication Policies page.

The following sample image shows the Job History page of a HBase replication policy between COD clusters:

The image shows the job history page of a HBase replication policy between COD clusters.