Creating HBase replication policy

You can replicate HBase data from a source classic cluster (CDH or CDP Private Cloud Base cluster), COD, or Data Hub to a target Data Hub or COD cluster.

  1. On the Management Console > Replication Manager > Replication Policies page, click Add Policy.
    The Create Replication Policy wizard appears.
  2. On the General page, choose or enter the following information:
    Option Description
    HBase Creates an HBase replication policy.
    Policy Name Enter a unique name for the replication policy.
    Description Optional. Enter a brief description about the replication policy.

    The following image shows a sample General page in the Create Replication Policy wizard:

    The image shows the General page in the Create Replication Policy wizard. Choose HBase option to continue creating a HBase replication policy.
  3. Click Next.
  4. On the Select Source page, enter or choose the options as required:
    Option Action
    Source Cluster or Database Choose a source cluster.
    Source Tables Enter a table name that you want to replicate. Click the Add icon to add more table names.
    Perform Initial Snapshot Select the option to replicate existing data.
    Cloud Credential Appears when you choose a CDP Private Cloud Base cluster or CDH cluster as the source cluster and you choose Perform Initial Snapshot.

    Click Add Cloud Credential. In the Add Cloud Credential dialog box, enter a unique name for the cloud credential.

    Click Save after you choose one of the following cloud storage types and enter the required options:
    • S3 - Choose an authentication type, enter an access key and secret key.
    • ADLS - Enter the client ID, tenant ID, and secret key.

    The following sample image shows the Select Source page in the Create Replication Policy wizard:

    The image shows the Select Source page in the Create Replication Policy wizard. You can choose a source cluster, source tables to replicate, choose the Perform Initial Snapshot option to replicate existing data, and add cloud credentials.
  5. Click Next.
  6. On the Select Destination page, enter or choose the options as required:
    Option Description
    Destination Data Hub or COD Choose a Data Hub cluster or COD.
    Set HBase Replication Machine User Optional. Choose the option and then enter the username and password. Ensure that you enter the correct password for an existing user because if the password is incorrect, the data is not replicated even though the policy is created successfully.

    Based on the username and password that you enter, one of the following possible scenarios is implemented by Replication Manager:

    • If Set HBase Replication Machine User is not selected, an HBase replication machine user is created automatically with an auto-generated username.
    • If Set HBase Replication Machine User is selected and Create User If Does Not Exist is not selected, ensure that the username you enter exists in the CDP User Management System (UMS), otherwise an error message appears.
    • If Set HBase Replication Machine User is selected and Create User If Does Not Exist is selected and the username does not exist in UMS, the username is created.
    Sync Replication User Optional. Replication Manager validates the existing username with the UMS and synchronizes the new username and password to the destination cluster’s environment (and to the source’s as well if the source is COD).

    The following options appear if the source Cloudera Manager is 7.6.0 or higher:

    Option Description
    Rolling HBase Service Restart on Source [Appears if you select COD or Data Hub as the source cluster] Select to enable automatic rolling restart* of HBase service on the source cluster after the HBase replication policy first-time setup steps are complete. Otherwise, Cloudera Manager performs an automatic full restart* of the service.
    Rolling HBase Service Restart on Destination Select to enable automatic rolling restart* of HBase service on the target cluster as a rolling restart* after the HBase replication policy first-time setup steps are complete. Otherwise, Cloudera Manager performs an automatic full restart of the service.
    Validate Replication Select to notify Replication Manager to verify that replication initiates after the policy creation is complete.
    *During rolling restart, one node is restarted at a time and this continues until all the nodes in the cluster are restarted. This type of restart ensures that there is no disruption of service. During full restart, all the nodes are shut down at once and restarted simultaneously.
    The following sample image shows the Select Destination page in the Create Replication Policy wizard:
    The image shows the Select Destination page in the Create Replication Policy wizard. You can choose the destination cluster here.
  7. Click Next.
  8. On the Initial Snapshot Settings page, configure the following options for the source cluster:
    Option Description
    YARN Queue Name Enter the name of the YARN queue for the cluster to which the replication job is submitted only if you are using Capacity Scheduler queues to limit resource consumption. The default value for this field is default.
    Maximum Maps Slots Configure the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.

    The following sample image shows the Initial Snapshot Settings page in the Create Replication Policy wizard:

    The image shows the Initial Snapshot Settings page in the Create Replication Policy wizard. You can retain the default values for YARN queue name and maximum maps slots or enter a new value, as required.
  9. Click Create.
  10. Restart the HBase service on the on-premises source cluster when the policy status on the Replication Policies page shows Manual restart (src) / restarting (dest) or Manual HBase restart needed on source. After the service restart is complete, the setup continues automatically for the replication policy. You do not need to restart the HBase service if the source is COD or Data Hub.

    The following image shows the Continue setup option for the HBase replication policy on the Replication Policies page:

    The image shows the Continue setup option for the HBase replication policy on the Replication Policies page.
After you create the first replication policy between a source cluster and target cluster (policy that is in setup/service restart state), Replication Manager creates and runs two schedules or jobs. The first schedule shows the service configuration and service restart progress, and the second schedule shows the policy creation progress. Subsequent replication policies between the same source cluster and target cluster creates only one job. Replication Manager restarts the HBase services on both the clusters if they are COD clusters.

To verify whether a replication policy is running, you can either click the replication policy on the Replication Policies page, or click Running Commands in Cloudera Manager.

The following sample image shows the Job History page of a HBase replication policy between COD clusters:

The image shows the job history page of a HBase replication policy between COD clusters.