HBase replication policy
You can replicate HBase data using HBase replication or you can create an HBase replication policy. An HBase replication policy replicates the data at table granularity.
The replication policy replicates the HBase data in the source cluster successfully only if a similar schema exists on the target cluster. Before you create a replication policy, make a list of tables and its column families in the source cluster. In the target cluster, create these tables and column families with the same names without adding any data. For information about how to create empty tables, see Creating empty tables on the destination cluster.
Replication scenarios and considerations
Before you copy HBase data using HBase replication policies, you might want go through the following replication scenarios and considerations:
- Before you replicate HBase data, create an empty table in the target cluster that has the same column families with the same names as the table and column families you want to replicate.
- After you create the replication policy, ensure that you maintain similar tables and column families in both clusters. This is because replication fails if there is a mismatch between the source and target tables and column families. For example, replication fails if you add a column family or column qualifier in a replicating table in the source cluster but do not add it to the corresponding table in the target cluster.
- After you create a replication policy, you can delete one or more tables from the policy.
- After you create a replication policy, if you change the schema in the
source cluster, you can choose one of the following methods to continue
replicating data successfully.
- Create a similar schema in the target cluster before you change the schema in the source cluster.
- Suspend or delete the original replication policy, make the necessary changes to the schema on the target cluster, and then create another replication policy. In the wizard, do not choose the Perform Initial Snapshot option. This prevents the existing data from being replicated.
- You can pause or stop replicating data, if required. You can pause or suspend replicating data and then resume replication. You can also delete the replication policy to stop replicating the data. To stop replicating data for a table, you can remove the table from the replication policy.
When you create a replication policy you can choose the Perform Initial Snapshot option to migrate the data that existed before you created the replication policy and the data that is generated after you create the policy. When you do not choose the option, the policy migrates only the data that is generated after you created the policy.
For example, suppose you have two tables named Orders and Customers in the source cluster and you want to copy the data from these tables from March 1, 2021 onwards. To accomplish this task, you create an HBase replication policy without choosing the Perform Initial Snapshot option in the Create Replication Policy wizard on March 1, 2021. The data that you create, update, or delete in the source cluster after you create the policy is automatically replicated to the target cluster.