Replicate HBase data simultaneously between multiple clusters
In Cloudera on cloud 7.2.16.500, 7.2.17.200, 7.2.18, and higher versions, you can create multiple HBase replication policies between multiple clusters to replicate HBase data. You must consider the limitations before you create a multi-cluster replication scenario. You can use the multi-cluster replication scenario for various use cases.
Multi-cluster HBase replication overview
During the first-time setup configuration, you must ensure that the source and target cluster use the same credentials.jceks file. Therefore, if multiple supported clusters share the same credentials.jceks file, you can replicate HBase data between them seamlessly using HBase replication policies.
The following image shows a sample multi-cluster HBase data replication scenario and a few possible replication directions:
Cloudera does not recommend manually replacing the credentials.jceks file
to create a multi-cluster HBase replication scenario. This is because when you create the first
HBase replication policy between a pair of clusters, Replication Manager triggers the first-time
setup and synchronizes the credentials.jceks file in both the clusters as
required for HBase data replication.
Limitations
- An HBase replication policy in a multi-cluster HBase replication setup fails when you use
clusters that are part of another independent replication setup. This occurs because the
clusters use different credentials.jceks files. To use these clusters,
you must break the cluster pairing and then create the required HBase replication policies.
Monitor the growing multi-cluster replication network to prevent disconnections. This ensures that all clusters use the same credentials.jceks file, the replication setup stays consistent, and no existing replication scenarios require a reset.
- The Replication Manager UI prevents the HBase replication policy creation if you choose a
cluster (as a source or target cluster) that is currently undergoing a first-time setup
process. In this case, wait a few minutes for the first-time setup to complete and then create
the HBase replication policy.
When you create the first HBase replication policy between two clusters, the first-time setup configuration is initiated. After the configuration completes, the HBase data replication is initiated.
- You must meet the following conditions to use the IDBroker credentials for
creating multiple HBase replication policies across multiple clusters when the target COD
clusters are in separate AWS accounts or when a single AWS Role lacks access to all the
required S3 buckets for all HBase target clusters:
- Use Cloudera on cloud 7.2.18.200 or higher versions.
- Choose the Perform Initial Snapshot option, and then specify the custom username in the Export snapshot user field in the Select Source page during the HBase replication policy creation process.
Example
The following example shows how cluster combinations might impact an HBase replication policy:
Assume that you have the following clusters:
- A – An on-premises classic cluster
- B – An on-premises classic cluster
- C – Cloud Operational Database (COD)
- D – Cloud Operational Database (COD)
Problem: You created an HBase replication policy from A → C and B → D, but cannot create policies from A → D or B → C.
Cause: The HBase replication policy from A → C uses a JCEKS file in clusters A and C, and the HBase replication policy from B → D uses a different JCEKS file in clusters B and D. Therefore, when you try to create an HBase replication policy between A → D, the policy creation fails as both clusters already have distinct JCEKS files.
Solution: Create the HBase replication policies in the following order so that the same JCEKS file gets propagated to every new cluster that is added while creating a replication policy:
- A → C
- B → C
- B → D
- A → D
Use cases
Multi-cluster HBase replication supports the following sample use cases:
-
Multiple source clusters and a single target cluster – For example, in a disaster-recovery scenario, you want to use a single COD to back up all the HBase data. The following image illustrates this scenario:
Figure 2. Multiple source clusters to a single target cluster -
Single source cluster and multiple target clusters – For example, all the HBase data is located in a cluster but you want to replicate only specific HBase tables to different environments, such as quality engineering or experimentation, to fulfill specific requirements. The following image illustrates this scenario:
Figure 3. Single source cluster to multiple target clusters
