Apache Hadoop High Availability
Also available as:
PDF
loading table of contents...

Propagating Writes to Region Replicas

As discussed in the introduction, writes are written only to the primary region replica.

The following two mechanisms are used to propagate writes from the primary replica to secondary replicas.

[Note]Note

By default, HBase tables do not use High Availability features. After configuring your cluster for High Availability, designate tables as HA by setting region replication to a value greater than 1 at table creation time. For more information, see Creating Highly-Available HBase Tables.

For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas.

StoreFile Refresher

The first mechanism is the store file refresher, which was introduced in Phase 1 (Apache HBase 1.0.0 and HDP 2.1).

Store file refresher is a thread per RegionServer, which runs periodically, and does a refresh operation for the store files of the primary region for the secondary region replicas. If enabled, the refresher ensures that the secondary region replicas see the new flushed, compacted or bulk loaded files from the primary region in a timely manner. However, this means that only flushed data can be read back from the secondary region replicas, and after the refresher is run, making the secondaries lag behind the primary for an a longer time.

To enable this feature, configure hbase.regionserver.storefile.refresh.period to a value greater than zero. For more information about these properties, see Configuring HA Reads for HBase.

Async WAL Replication

The second mechanism for propagating writes to secondaries is done via the Async WAL Replication feature. This feature is only available in HA Phase 2 (starting with HDP 2.2).

Async WAL replication works similarly to HBase’s multi-datacenter replication, but the data from a region is replicated to its secondary regions. Each secondary replica always receives writes in the same order that the primary region committed them. In some sense, this design can be thought of as "in-cluster replication"; instead of replicating to a different datacenter, the data goes to secondary regions. This process keeps the secondary region’s in-memory state up to date. Data files are shared between the primary region and the other replicas, so there is no extra storage overhead. However, secondary regions have recent non-flushed data in their MemStores, which increases memory overhead. The primary region writes flush, compaction, and bulk load events to its WAL as well, which are also replicated through WAL replication to secondaries. When secondary replicas detect a flush/compaction or bulk load event, they replay the event to pick up the new files and drop the old ones.

Committing writes in the same order as in the primary region ensures that the secondaries won’t diverge from the primary region's data, but because the log replication is asynchronous, the data might still be stale in secondary regions. Because this feature works as a replication endpoint, performance and latency characteristics should be similar to inter-cluster replication.

Async WAL Replication is disabled by default. To enable this feature, set hbase.region.replica.replication.enabled to true. For more information about these properties, see Creating Highly-Available HBase Tables.

When you create a table with High Availability enabled, the Async WAL Replication feature adds a new replication peer (named region_replica_replication).

Once enabled, to disable this feature you'll need to perform the following two steps:

  • Set hbase.region.replica.replication.enabled to false in hbase-site.xml.

  • In your cluster, disable the replication peer named region_replica_replication, using hbase shell or ReplicationAdmin class: hbase> disable_peer 'region_replica_replication'

Store File TTL

In phase 1 and 2 of the write propagation approaches mentioned above, store files for the primary replica are opened in secondaries independent of the primary region. Thus, for files that the primary region compacted and archived, the secondaries might still refer to these files for reading.

Both features use HFileLinks to refer to files, but there is no guarantee that the file is not deleted prematurely. To prevent I/O exceptions for requests to replicas, set the configuration property hbase.master.hfilecleaner.ttl to a sufficient time range such as 1 hour.

Region Replication for the META Table’s Region

Currently, Async WAL Replication is not done for the META table’s WAL -- the META table’s secondary replicas still refresh themselves from the persistent store files. To ensure that the META store files are refreshed, set hbase.regionserver.meta.storefile.refresh.period to a non-zero value. This is configured differently than hbase.regionserver.storefile.refresh.period.