Creating HBase replication policy
You can replicate HBase data from a source classic cluster (CDH or CDP Private Cloud Base cluster), COD, or Data Hub to a target Data Hub or COD cluster.
On the Add Policy.
page, click The Create Replication Policy wizard appears.
On the General page, choose or enter the following
Option Description HBase Creates an HBase replication policy. Policy Name Enter a unique name for the replication policy. Description Optional. Enter a brief description about the replication policy.
The following image shows a sample General page in the Create Replication Policy wizard:
- Click Next.
On the Select Source page, enter or choose the options
Option Action Source Cluster or Database Choose a source cluster. Source Tables Enter a table name that you want to replicate. Click the Add icon to add more table names. Perform Initial Snapshot Select the option to replicate existing data. Cloud Credential Appears when you choose a CDP Private Cloud Base cluster or CDH cluster as the source cluster and you choose Perform Initial Snapshot.
Click Add Cloud Credential. In the Add Cloud Credential dialog box, enter a unique name for the cloud credential.Click Save after you choose one of the following cloud storage types and enter the required options:
- S3 - Choose an authentication type, enter an access key and secret key.
- ADLS - Enter the client ID, tenant ID, and secret key.
The following sample image shows the Select Source page in the Create Replication Policy wizard:
- Click Next.
On the Select Destination page, enter or choose the
options as required:
Option Description Destination Data Hub or COD Choose a Data Hub cluster or COD. Set HBase Replication Machine User Optional. Choose the option and then enter the username and password. Ensure that you enter the correct password for an existing user because if the password is incorrect, the data is not replicated even though the policy is created successfully.
Based on the username and password that you enter, one of the following possible scenarios is implemented by Replication Manager:
- If Set HBase Replication Machine User is not selected, an HBase replication machine user is created automatically with an auto-generated username.
- If Set HBase Replication Machine User is selected and Create User If Does Not Exist is not selected, ensure that the username you enter exists in the CDP User Management System (UMS), otherwise an error message appears.
- If Set HBase Replication Machine User is selected and Create User If Does Not Exist is selected and the username does not exist in UMS, the username is created.
Sync Replication User Optional. Replication Manager validates the existing username with the UMS and synchronizes the new username and password to the destination cluster’s environment (and to the source’s as well if the source is COD).
The following options appear if the source Cloudera Manager is 7.6.0 or higher:
Option Description Rolling HBase Service Restart on Source [Appears if you select COD or Data Hub as the source cluster] Select to enable automatic rolling restart* of HBase service on the source cluster after the HBase replication policy first-time setup steps are complete. Otherwise, Cloudera Manager performs an automatic full restart* of the service. Rolling HBase Service Restart on Destination Select to enable automatic rolling restart* of HBase service on the target cluster as a rolling restart* after the HBase replication policy first-time setup steps are complete. Otherwise, Cloudera Manager performs an automatic full restart of the service. Validate Replication Select to notify Replication Manager to verify that replication initiates after the policy creation is complete. *During rolling restart, one node is restarted at a time and this continues until all the nodes in the cluster are restarted. This type of restart ensures that there is no disruption of service. During full restart, all the nodes are shut down at once and restarted simultaneously.The following sample image shows the Select Destination page in the Create Replication Policy wizard:
- Click Next.
On the Initial Snapshot Settings page, configure the
following options for the source cluster:
Option Description YARN Queue Name Enter the name of the YARN queue for the cluster to which the replication job is submitted only if you are using Capacity Scheduler queues to limit resource consumption. The default value for this field is default. Maximum Maps Slots Configure the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.
The following sample image shows the Initial Snapshot Settings page in the Create Replication Policy wizard:
- Click Create.
Restart the HBase service on the on-premises source cluster when the policy
status on the Replication Policies page shows
Manual restart (src) / restarting (dest) or
Manual HBase restart needed on source. After the
service restart is complete, the setup continues automatically for the
replication policy. You do not need to restart the HBase service if the source
is COD or Data Hub.
The following image shows the Continue setup option for the HBase replication policy on the Replication Policies page:
To verify whether a replication policy is running, you can either click the replication policy on the Replication Policies page, or click Running Commands in Cloudera Manager.
The following sample image shows the Job History page of a HBase replication policy between COD clusters: