Keep replicas current

You can choose from two different mechanisms for keeping replicas up to date: using a timer or using replication.

Using a Timer: In this mode, replicas are refreshed at a time interval controlled by the configuration option hbase.regionserver.storefile.refresh.period.

Using Replication: In this mode, replicas are kept current between a source and sink cluster using HBase Async WAL replication. This works similarly to HBase’s multi-datacenter replication, but instead the data from a region is replicated to the secondary regions. Each secondary replica always receives and observes the writes in the same order that the primary region committed them. In some sense, this design can be thought of as in-cluster replication, where instead of replicating to a different datacenter, the data goes to secondary regions to keep secondary region’s in-memory state up-to-date. The data files are shared between the primary region and other replicas, so that no extra storage overhead exists. However, the secondary regions contain recent non-flushed data in their memstores, which increases the memory overhead. The primary region writes flush, compaction, and bulk load events to its WAL as well, which are also replicated through WAL replication to secondaries. When they observe the flush or compaction or bulk load event, the secondary regions replay the event to pick up the new files and drop the old ones.

Committing writes in the same order as in primary ensures that the secondaries do not diverge from the primary regions data, however; the data might still be stale in secondary regions because the log replication is asynchronous.

Async WAL Replication is disabled by default. You can enable this feature by setting hbase.region.replica.replication.enabled to true.