Restore data from a replica

Recover HBase data from a replicated cluster in a disaster recovery scenario.

One of the main reasons for replications is to be able to restore data, whether during disaster recovery or for other reasons. During restoration, the source and sink roles are reversed. The source is the replica cluster, and the sink is the cluster that needs restoration. This can be confusing, especially if you are in the middle of a disaster recovery scenario. The following image illustrates the role reversal between normal production and disaster recovery.

HBase replication roles in normal production versus disaster recovery

Change the value of the column family property REPLICATION_SCOPE on the sink to 0 for each column to be restored, so that its data will not be replicated during the restore operation.
Change the value of the column family property REPLICATION_SCOPE on the source to 1 for each column to be restored, so that its data will be replicated.
Use the copyTable or distcp commands to import the data from the backup to the sink cluster.
Add the sink as a replication peer to the source, using the add_peer command.
If you used distcp in step 3,restart or rolling restart both clusters.
The RegionServers pick up the new files.

When restoration is complete, do the following:

Change the REPLICATION_SCOPE values back to their values before initiating the restoration.