4. Configuring HA Reads for HBase

To enable high availability for HBase reads, specify the following server-side and client-side configuration properties in your hbase-site.xml configuration file, and then restart the HBase Master and Region Servers.

The following table describes server-side properties. Set these properties for all servers in your HBase cluster that will use region replicas.

PropertyExample valueDescription

hbase.regionserver. storefile.refresh.period

30000

Specifies the period (in milliseconds) for refreshing the store files for secondary regions. The default value is 0, which indicates that the feature is disabled. Secondary regions receive new files from the primary region after the secondary replica refreshes the list of files in the region.

Note: Too-frequent refreshes might cause extra Namenode pressure. If files cannot be refreshed for longer than HFile TTL, specified with hbase.master.hfilecleaner.ttl, the requests are rejected.

Refresh period should be a non-zero number if META replicas are enabled (see hbase.meta.replica.count).

If you specify refresh period, we recommend configuring HFile TTL to a larger value than its default.

hbase.region.replica. replication.enabled

true

Determines whether asynchronous WAL replication is enabled or not. The value can be true or false. The default is false.

If this property is enabled, a replication peer named region_replica_replication is created. The replication peer replicates changes to region replicas for any tables that have region replication set to 1 or more.

After enabling this property, disabling it requires setting it to false and disabling the replication peer using the shell or the ReplicationAdmin java class. When replication is explicitly disabled and then re-enabled, you must set hbase.replication to true.

hbase.master. hfilecleaner.ttl

3600000

Specifies the period (in milliseconds) to keep store files in the archive folder before deleting them from the file system.

hbase.master. loadbalancer.class

org.apache.hadoop.hbase. master.balancer. StochasticLoadBalancer

Specifies the Java class used for balancing the load of all HBase clients.

The default value is org.apache.hadoop.hbase. master.balancer. StochasticLoadBalancer, which is the only load balancer that supports reading data from Region Servers in secondary mode.

hbase.meta.replica.count

3

Region replication count for the meta regions. The default value is 1.

hbase.regionserver. storefile.refresh.all

false

Determines whether all store files will be refreshed, as opposed to just META tables. The default is true.

Set this value to false when hbase.region.replica. replication.enabled is true. This should be true if meta replicas are enabled (via hbase.meta.replica.count set to greater than 1).

hbase.region.replica.wait. for.primary.flush

true

Specifies whether to wait for a full flush cycle from the primary before starting to serve data in a secondary replica.

Disabling this feature might cause secondary replicas to read stale data when a region is transitioning to another Region Server.

hbase.region.replica. storefile.refresh. memstore.multiplier

4

Multiplier for a “store file refresh” operation for the secondary region replica.

If a Region Server has memory pressure, the secondary region will refresh its store files if the MemStore size of the biggest secondary replica is bigger than this multiplier times than the MemStore size oxlinkxf the biggest primary replica.

To disable this feature (not recommended), set this property to a large value.

The following table lists client-side properties. Set these properties for all clients (applications) and servers (in your HBase cluster) that will use region replicas.

PropertyExample valueDescription

hbase.ipc.client. specificThreadForWriting

true

Specifies whether to enable interruption of RPC threads at the client side. This is required for region replicas with fallback RPC’s to secondary regions.

hbase.client. primaryCallTimeout.get

10000

Specifies the timeout (in microseconds) before secondary fallback RPC’s are submitted for get requests with Consistency.TIMELINE to the secondary replicas of the regions. The default value is 10ms.

Setting this to a smaller value increases the number of RPC’s, but lowers 99th-percentile latencies.

hbase.client. primaryCallTimeout. multiget

10000

Specifies the timeout (in microseconds) before secondary fallback RPC’s are submitted for multi-get requests (HTable.get(List<Get>)) with Consistency.TIMELINE to the secondary replicas of the regions. The default value is 10ms.

Setting this to a smaller value increases the number of RPC’s, but lowers 99th-percentile latencies.

hbase.client. primaryCallTimeout.scan

1000000

Specifies the timeout (in microseconds) before secondary fallback RPC’s are submitted for scan requests with Consistency.TIMELINE to the secondary replicas of the regions. The default value is 1 second.

Setting this to a smaller value increases the number of RPC’s, but lowers 99th-percentile latencies.

hbase.meta.replicas.use

true

Specifies whether to use META table replicas or not. The default value is false.