Methods to replicate HBase data

You can replicate only existing HBase data, generated HBase data, or both depending on your requirements. You also can choose to replicate all the HBase tables or only the required tables in a database.

Replicate only the generated data from chosen tables

In this method, you choose one or more tables during the replication policy creation process. Replication Manager replicates only the data that is generated after policy creation.

Replicate existing data and generated data from chosen tables

In this method, you choose one or more tables, and also choose the Select Source > Perform Initial Snapshot option during the HBase replication policy creation process. Replication Manager replicates the existing data and the data that is generated after policy creation.

For example, you have two tables named 'Orders' and 'Customers' in the source cluster and you want to copy the data from these tables from March 1, 2021 onwards. To accomplish this task, you create an HBase replication policy without choosing the Perform Initial Snapshot option in the Create Replication Policy wizard on March 1, 2021. The data that you create, update, or delete in the source cluster after you created the HBase replication policy is automatically replicated to the target cluster.

Replicate existing tables and future tables in the database

In this method, you choose the Select Source > Replicate Database option during the HBase replication policy creation process. Replication Manager replicates the generated data from the existing tables, and it replicates data from the future tables that are created after the HBase replication policy creation process is complete.

To replicate data from the future tables successfully, you must create similar empty tables on the target cluster. You can perform this action when you create or add a table to the database on the source cluster.

You can choose the Replicate Database option only if the following conditions are true:
  • Target Cloudera Manager version is 7.11.0 or higher.
  • Source cluster’s CDH version is 6.x or higher.

    CDH 5.16.2 and higher versions also support the Replicate Database option after you upgrade the source cluster Cloudera Manager.

  • No existing HBase replication policies exist between the source and target clusters.

After you select the Select Source > Replicate Database option in the HBase replication policy wizard, you can choose one of the following options to determine the tables in the database to replicate:

  • Replicate all user tables - Replicates all the HBase tables in the database after the replication scope of the tables are set to 1.
  • Replicate only tables where replication is already enabled - Replicates only those tables for which the replication scope is already set to 1.

    This option is supported only if the target cluster CDP version is 7.2.17.300 using Cloudera Manager 7.11.0-h3 or higher versions or CDP version 7.2.16.500 using Cloudera Manager 7.9.0-h7 or higher versions, or CDP version 7.12.0.0.

Replicate existing data and generated data from chosen tables and future tables

In this method, you choose the Perform Initial Snapshot and Replicate Database options on the Select Source page during the HBase replication policy creation process. You can also choose to replicate all the tables in the database or only those tables for which the replication scope is already set to 1. Replication Manager replicates the existing and generated data from the existing tables in addition to the data in future tables.