Recover HBase data from a replicated cluster in a disaster recovery scenario.
One of the main reasons for replications is to be able to restore data, whether
during disaster recovery or for other reasons. During restoration, the source
and sink roles are reversed. The source is the replica cluster, and the sink
is the cluster that needs restoration. This can be confusing, especially if you are
in the middle of a disaster recovery scenario. The following image illustrates the
role reversal between normal production and disaster recovery.
-
Change the value of the column family property
REPLICATION_SCOPE
on the sink to 0
for
each column to be restored, so that its data will not be replicated during the
restore operation.
-
Change the value of the column family property
REPLICATION_SCOPE
on the source to 1
for
each column to be restored, so that its data will be replicated.
-
Use the
copyTable
or distcp
commands to
import the data from the backup to the sink cluster.
-
Add the sink as a replication peer to the source, using the
add_peer
command.
-
If you used
distcp
in step 3,restart or
rolling restart both clusters.
The RegionServers pick up the new files.
When restoration is complete, do the following:
-
Change the
REPLICATION_SCOPE
values back to their values
before initiating the restoration.