Creating HBase replication policy
You can replicate HBase data from a CDH cluster or Cloudera Operational Database (COD) cluster to another COD cluster, and replicate HBase data from a CDP Private Cloud Base cluster or CDH cluster to a Data Lake cluster.
On the AWS cluster and ADLS cluster, make sure that the ports 16020 for worker security group and 2181 for worker, master, and leader groups are open for connection from the source cluster to the destination cluster to ensure that the source HBase service can reach Zookeeper and HBase services on the destination hosts. For more information, see Ports for HBase replication.
You must also ensure that the DNS resolution works as expected between the source and destination clusters.
Perform the following steps on the CDP Private Cloud Base source cluster or CDH source cluster (these steps are not required for COD sources):
(Applicable for CDH versions 7.2.x lower than 7.2.2, versions 7.1.x lower than 7.1.5, and for versions lower than 7.x). Install the HBase replication plugin parcel. For information about the parcel, contact Cloudera Support.
(Applicable only for Cloudera Manager versions 7.4.3 or lower). Create the /user/hbase folder for the hbase user in HDFS in the source cluster. To create the folder, run the following commands:
sudo -u hdfs hdfs dfs -mkdir /user/hbase sudo -u hdfs hdfs dfs -chown hbase:hbase /user/hbase
These commands allow the HBase replication policy to replicate the existing data in the source cluster.
- Open the Replication Manager service in Cloudera Data Platform web interface.
- Click Replication Policies.
- Click Create Policy.
- On the Create Replication Policy wizard, enter a name for the replication policy on the General page.
- Optionally, add a description for the policy.
The following sample image shows the General page in the Create Replication Policy wizard:
- Click Next.
On the Select Source page, choose the values for the
The following sample image shows the Select Source page in the Create Replication Policy wizard:
- Source Cluster or Database. Choose a source cluster.
- Source Tables. Enter a table name that you want to replicate. Click the Add icon to add more table names.
- Perform Initial Snapshot. Select the option to replicate existing data.
Cloud Credential. This option appears when you
choose a CDP Private Cloud Base cluster or CDH cluster as the source
cluster and you choose Perform Initial
Click Add Cloud Credential. In the Add Cloud Credential dialog box, enter a unique name for the cloud credential. Choose one of the following cloud storage type:
- S3 - Choose an authentication type, enter an access key and secret key, and click Validate to validate the credentials.
- ADLS - Enter the client ID, tenant ID, and secret key.
- Click Next.
On the Select Destination page, choose the values for
the following options:
- Destination Data Hub or COD. Choose a Data Lake cluster or COD.
Choose Set HBase Replication Machine User, and
enter the username and password. Ensure that you enter the correct
password for an existing user because if the password is incorrect, the
data is not replicated even though the policy is created
Based on the username and password that you enter, one of the following possible scenarios is implemented by Replication Manager:
If Set HBase Replication Machine User is not selected, an HBase replication machine user is created automatically with an auto-generated username.
If Set HBase Replication Machine User is selected and Create User If Does Not Exist is not selected, ensure that the username you enter exists in the CDP User Management System (UMS), otherwise an error message appears.
If Set HBase Replication Machine User is selected and Create User If Does Not Exist is selected and the username does not exist in UMS, the username is created.
Click Sync Replication User. Replication Manager
validates the existing username with the UMS and synchronizes the new
username and password to the destination cluster’s environment (and to
the source’s as well if the source is COD).
The following sample image shows the Select Destination page in the Create Replication Policy wizard when you choose when you choose a CDH or CDP Private Cloud Base source cluster:
- Click Next.
On the Initial Snapshot Settings page, configure the
following options for the source cluster:
The following sample image shows the Initial Snapshot Settings page in the Create Replication Policy wizard:
- YARN Queue Name - If you are using Capacity Scheduler queues to limit resource consumption, enter the name of the YARN queue for the cluster to which the replication job is submitted. The default value for this field is default.
- Maximum Maps Slots - Use this option to set the maximum number of map tasks (simultaneous copies) per replication job. The default value is 20.
- Click Create.
If you chose a CDH cluster or CDP Private Cloud Base cluster as a source
cluster and a Data Lake or COD cluster as the destination cluster, you must
perform the following steps on the source cluster:
The following image shows the Continue setup option for the HBase replication policy on the Replication Policies page:After you create the first replication policy between a source cluster and target cluster (policy that is in setup/service restart state), Replication Manager creates and runs two schedules or jobs. The first schedule shows the service configuration and service restart progress, and the second schedule shows the policy creation progress. Subsequent replication policies between the same source cluster and target cluster creates only one job.
- Restart the HBase service when the policy status on the Replication Policies page shows Manual restart (src) / restarting (dest) or Waiting for service restart on source.
- After the service restart is complete, click Continue setup for the replication policy on the Replication Policies page.
Click the replication policy on the Replication Policies page, or click Running Commands in Cloudera Manager to verify whether a replication policy is running or to view the policy job progress,
The following sample image shows the Job History page of a HBase replication policy between COD clusters: