With timeline consistency, HBase introduces a consistency definition that can be provided per get or scan read operation:
public enum Consistency { STRONG, TIMELINE }
Consistency.STRONG
is the default consistency model provided by HBase. If a table has region
replication set to 1, or has region replicas but the reads are done with time consistency enabled, the read is always
performed by the primary regions. This preserves previous behavior and the client receives the latest data.
If a read is performed with Consistency.TIMELINE
, then the read
RPC will be sent to the primary Region Server first. After a short interval, such as the
default setting of 10 milleseconds for the hbase.client.primaryCallTimeout.get
property, the parallel RPC for secondary region replicas is sent if the primary region replica
does not respond. HBase returns the result from the RPC that finishes first. If the response
is from the primary region replica, the data is current. You can use the
Result.isStale()
API to determine the state of the resulting data:
If the result is from a primary region,
Result.isStale()
is set to false.If the result is from a secondary region,
Result.isStale()
is set to true.
TIMELINE
consistency as implemented by HBase differs from pure eventual consistency in
the following respects:
Single homed and ordered updates: Whether region replication is enabled or not, on the write side, there is still only one defined replica, the primary, that can accept writes. This replica is responsible for ordering the edits and preventing conflicts. This guarantees that two different writes are not committed at the same time by different replicas, resulting in divergent data. With this approach, there is no need to do read-repair or last-timestamp-wins types of conflict resolution.
The secondary replicas also apply edits in the order that the primary committed them. Thus the secondaries contain a snapshot of the primary data at any point in time. This is similar to RDBMS replications and HBase multi-datacenter replication, but takes place in a single cluster.
On the read side, the client can detect whether the read is coming from up-to-date data or stale data. Also, the client can issue reads with different consistency requirements on a per-operation basis to ensure its own semantic guarantees.
The client might still read stale data if it receives data from one secondary replica first, followed by reads from another secondary replica. There is no stickiness to region replicas, nor is there a transaction ID-based guarantee. If required, this can be implemented later.
Memory Accounting
Secondary region replicas refer to data files in the primary region replica, but they have their own MemStores
in HA Phase 2 and use block cache as well. However, secondary region replicas cannot flush
data when there is memory pressure for their MemStores
. They can only free up MemStore
memory when the primary
region does a flush and the flush is replicated to the secondary.
Because a Region Server can host primary replicas for some regions and secondaries for others, secondary replicas might generate extra flushes to primary regions in the same host. In extreme situations, there might be no memory for new writes from the primary, by way of write ahead log (WAL) replication.
To resolve this situation, the secondary replica is allowed to do a store file refresh, which is a file system list
operation to pick up new files from the primary and possibly dropping its MemStore
. This refresh will only be performed
if the MemStore
size of the biggest secondary region replica is at least
hbase.region.replica.storefile.refresh.memstore.multiplier
times bigger than the biggest MemStore
of a primary replica. The default value for hbase.region.replica.storefile.refresh.memstore.multiplier
is 4.
Note | |
---|---|
If this operation is performed, the secondary replica might obtain partial row updates across column families because column families are flushed independently. Hortonworks recommends that you configure HBase to perform this operation infrequently. You can disable this feature by setting the value to a large number, but this might cause replication to be blocked without resolution. |
Secondary Replica Failover
When a secondary region replica first comes online, or after a secondary region fails over, it may have contain edits
from its MemStore
. The secondary replica must ensure that it accesses stale data that has been overwritten before
serving requests after assignment. Therefore, the secondary waits until it detects a full flush cycle, consisting of start flush and commit flush,
or a region open event replicated from the primary replica.
Until the flush cycle occurs, the secondary region replica rejects all read requests by way of an IOException with the following message:
The region's reads are disabled
Other replicas might be available to read, thus not causing any impact for the RPC with TIMELINE consistency.
To facilitate faster recovery, the secondary region triggers a flush request from
the primary when it is opened. The configuration property
hbase.region.replica.wait.for.primary.flush
, which is enabled by default, can be
used to disable this feature if needed.