Initiate replication when data already exist

You can initiate replication when data already exist by taking advantage of the accumulation that happens when a replication peer is disabled.

You may need to start replication from some point in the past. For example, suppose you have a primary HBase cluster in one location and are setting up a disaster-recovery (DR) cluster in another. To initialize the DR cluster, you need to copy over the existing data from the primary to the DR cluster, so that when you need to switch to the DR cluster you have a full copy of the data generated by the primary cluster. Once that is done, replication of new data can proceed as normal.

  1. Start replication.
  2. Add the destination cluster as a peer.
  3. Immediately disable it using disable_peer. This will cause the source HBase cluster to temporarily spool replicated updates to tables while the next steps are completed.
  4. Run hbase shell and issue a snapshot 'myTable', 'myTableSnapshot-122112' for each table on the source cluster. The snapshot command flushes the table from memory.
  5. Export each snapshot from the source cluster and stage it on the destination cluster. On the source cluster, for example, run `hbase org.apache.hadoop.hbase.snapshot. ExportSnapshot -snapshot MySnapshot -copy-to hdfs://yourserver:8020/hbase_root_dir -mappers 16` for each table snapshot.
  6. Import and restore the snapshot on the destination cluster.
    • On the destination cluster, run hbase shell and issue a restore_snapshot for each table.

    If you are replicating data from or to a secure cluster, see Configure Replication

  7. Run enable_peer to re-enable the destination cluster. Re-enabling the peer will cause the source HBase cluster to send temporarily spooled updates to the destination cluster.