Creating HBase replication policy

You can replicate HBase data from a source classic cluster (CDH or CDP Private Cloud Base cluster), COD, or Data Hub to a target Data Hub or COD cluster in CDP Public Cloud Replication Manager.

Ensure that the HBase replication policy prerequisites are complete.
  1. On the Management Console > Replication Manager > Replication Policies page, click Add Policy.
    The Create Replication Policy wizard appears.
  2. On the General page, choose or enter the following information:
    Option Description
    HBase Creates an HBase replication policy.
    Policy Name Enter a unique name for the replication policy.
    Description Optional. Enter a brief description about the replication policy.

    The following image shows a sample General page in the Create Replication Policy wizard:

    The image shows the General page in the Create Replication Policy wizard. Choose HBase option to continue creating a HBase replication policy.
  3. Click Next.
  4. On the Select Source page, enter or choose the options as required:
    Option Action
    Source Cluster or Database Choose a source cluster.
    Source Tables Enter a table name that you want to replicate. Click the Add icon to add more table names.
    Perform Initial Snapshot Select the option to replicate existing data.
    Credentials are available in source cluster HDFS service configuration setting You can choose this option when you want to use a Google Cloud account. You can use this option for S3 and ADLS accounts as well. Before you use this option, ensure that the advanced configuration settings in Preparing to create an HBase replication policy are configured.
    Credentials from External Account Choose this option for S3 and ADLS storage options. This option appears when you choose a CDP Private Cloud Base cluster or CDH cluster as the source cluster and you choose Perform Initial Snapshot.

    Click Add Cloud Credential. In the Add Cloud Credential dialog box, enter a unique name for the cloud credential.

    Click Save after you choose one of the following cloud storage types and enter the required options:
    • S3 - Choose an authentication type, enter an access key and secret key.
    • ADLS - Enter the client ID, tenant ID, and secret key.
    Replicate Database

    Replication Manager replicates:

    • current and future data from the existing tables if you choose the Perform Initial Snapshot option. Otherwise, only the future data from the existing tables is replicated.
    • data from the future tables that are created after policy creation.

      To replicate data from the future tables successfully, you must create similar empty tables on the target cluster. You can perform this action when you create or add a table to the database on the source cluster.

    You can choose the Replicate Database option only if the following conditions are true:
    • Target Cloudera Manager version is 7.11.0 or higher.
    • Source cluster version is CDH 6.x or higher.

      CDH 5.16.2 and higher versions also support the Replicate Database option after you upgrade the source cluster Cloudera Manager.

    • No existing HBase replication policies exist between the source and target clusters.
    Replicate all user tables Appears after you choose the Replicate Database option. Choose this option to replicate all the HBase tables in the database. This action sets the replication scope to 1 for all the tables and then replicates the tables to the target cluster.
    Replicate only tables where replication is already enabled Appears after you choose the Replicate Database option. Choose this option to replicate only those tables for which the replication scope is already set to 1. This provides you a choice to replicate only the required tables in a database.

    This option is supported only if the target cluster CDP version is 7.2.17.300 using Cloudera Manager 7.11.0-h3 or higher versions or CDP version 7.2.16.500 using Cloudera Manager 7.9.0-h7 or higher versions.

    The following sample image shows the Select Source page in the Create Replication Policy wizard:

    The image shows the Select Source page in the Create Replication Policy wizard. You can choose a source cluster, source tables to replicate, choose the Perform Initial Snapshot option to replicate existing data, and add cloud credentials.
  5. Click Next.
  6. On the Select Destination page, enter or choose the options as required:
    Option Description
    Destination Data Hub or COD Choose a Data Hub cluster or COD.
    Set HBase Replication Machine User Optional. Choose the option and then enter the username and password. Ensure that you enter the correct password for an existing user because if the password is incorrect, the data is not replicated even though the policy is created successfully.

    Based on the username and password that you enter, one of the following possible scenarios is implemented by Replication Manager:

    • If Set HBase Replication Machine User is not selected, an HBase replication machine user is created automatically with an auto-generated username.
    • If Set HBase Replication Machine User is selected and Create User If Does Not Exist is not selected, ensure that the username you enter exists in the CDP User Management System (UMS), otherwise an error message appears.
    • If Set HBase Replication Machine User is selected and Create User If Does Not Exist is selected and the username does not exist in UMS, the username is created.
    Sync Replication User Optional. Replication Manager validates the existing username with the UMS and synchronizes the new username and password to the destination cluster’s environment (and to the source’s as well if the source is COD).

    The following options appear if the source Cloudera Manager is 7.6.0 or higher:

    Option Description
    Rolling HBase Service Restart on Source [Appears if you select COD or Data Hub as the source cluster] Select to enable automatic rolling restart* of HBase service on the source cluster after the HBase replication policy first-time setup steps are complete. Otherwise, Cloudera Manager performs an automatic full restart* of the service.
    Rolling HBase Service Restart on Destination Select to enable automatic rolling restart* of HBase service on the target cluster as a rolling restart* after the HBase replication policy first-time setup steps are complete. Otherwise, Cloudera Manager performs an automatic full restart of the service.
    I want to force the setup of this HBase replication policy Choose to run the first-time setup configuration between the selected source and destination clusters.

    This option appears when the selected source or target cluster is part of an existing cluster pair, and one of the following is true about the cluster pair:

    • No HBase replication policies exist between them.
    • The other cluster in the pair is currently unreachable.

    When you select the option, you acknowledge that the existing pairing for the selected source or target cluster will be cleared and the first-time setup will be initiated with the chosen new source or destination cluster.

    Validate Replication Select the option to notify Replication Manager to verify the provided details so that the replication is initiated after the policy creation is complete.

    Before you select the Validate Replication option during the first HBase replication policy creation between two specific clusters, you must ensure that the 16000 port is open on the target cluster.

    *During rolling restart, one node is restarted at a time and this continues until all the nodes in the cluster are restarted. This type of restart ensures that there is no disruption of service. During full restart, all the nodes are shut down at once and restarted simultaneously.
  7. Click Next.
  8. On the Initial Snapshot Settings page, configure the following options for the source cluster:
    Option Description
    YARN Queue Name Enter the name of the YARN queue for the cluster to which the replication job is submitted only if you are using Capacity Scheduler queues to limit resource consumption. The default value for this field is default.
    Maximum Maps Slots Configure the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.
    Maximum Bandwidth Adjust this setting so that each map task is throttled to consume only the specified bandwidth.

    Each map task ((simultaneous copy) is restricted to consume only the specified bandwidth. This is not always exact. The map throttles back its bandwidth consumption during a copy in such a way that the net bandwidth used tends towards the specified value. You can adjust this setting so that each map task is throttled to consume only the specified bandwidth so that the net bandwidth used tends towards the specified value. The default value for the bandwidth is 100MB per second for each mapper.

    You can adjust the setting only if the source and destination Cloudera Manager instances support this option.

    The following sample image shows the Initial Snapshot Settings page in the Create Replication Policy wizard:

    The image shows the Initial Snapshot Settings page in the Create Replication Policy wizard. You can retain the default values for YARN queue name and maximum maps slots or enter a new value, as required.
  9. Click Create.
  10. Restart the HBase service on the on-premises source cluster when the policy status on the Replication Policies page shows Manual restart (src) / restarting (dest) or Manual HBase restart needed on source. After the service restart is complete, the setup continues automatically for the replication policy. You do not need to restart the HBase service if the source is COD or Data Hub.

    The following image shows the Continue setup option for the HBase replication policy on the Replication Policies page:

    The image shows the Continue setup option for the HBase replication policy on the Replication Policies page.
After you create the first replication policy between a source cluster and target cluster (policy that is in setup/service restart state), Replication Manager creates and runs two schedules or jobs. The first schedule shows the service configuration and service restart progress, and the second schedule shows the policy creation progress. Subsequent replication policies between the same source cluster and target cluster creates only one job. Replication Manager restarts the HBase services on both the clusters if they are COD clusters.

To verify whether a replication policy is running, you can either click the replication policy on the Replication Policies page, or click Running Commands in Cloudera Manager.

The following sample image shows the Job History page of a HBase replication policy between COD clusters:

The image shows the job history page of a HBase replication policy between COD clusters.