You can initiate replication when data already exist by taking advantage of the
accumulation that happens when a replication peer is disabled.
You may need to start replication from some point in the past. For example, suppose you have a primary HBase cluster in one location and are setting up a disaster-recovery (DR) cluster in another. To initialize the DR cluster, you need to copy over the existing data from the primary to the DR cluster, so that when you need to switch to the DR cluster you have a full copy of the data generated by the primary cluster. Once that is done, replication of new data can proceed as normal.
-
Start replication.
-
Add the destination cluster as a peer.
-
Immediately disable it using
disable_peer
. This will cause the
source HBase cluster to temporarily spool replicated updates to tables while the
next steps are completed.
-
Run
hbase shell
and issue a snapshot
'myTable'
, 'myTableSnapshot-122112'
for
each table on the source cluster. The snapshot command flushes the table from
memory.
-
Export each snapshot from the source cluster and stage it on the destination
cluster. On the source cluster, for example, run
`hbase
org.apache.hadoop.hbase.snapshot. ExportSnapshot -snapshot MySnapshot
-copy-to hdfs://yourserver:8020/hbase_root_dir -mappers 16`
for
each table snapshot.
-
Import and restore the snapshot on the destination cluster.
- On the destination cluster, run
hbase shell
and issue a
restore_snapshot
for each table.
If you are replicating data from or to a secure cluster, see Configure
Replication
-
Run
enable_peer
to re-enable the destination cluster.
Re-enabling the peer will cause the source HBase cluster to send temporarily
spooled updates to the destination cluster.