Hadoop High Availability
Also available as:
PDF
loading table of contents...

Configuring HA Reads for HBase

To enable High Availability for HBase reads, specify the following server-side and client-side configuration properties in your hbase-site.xml configuration file, and then restart the HBase Master and Region Servers.

The following table describes server-side properties. Set these properties for all servers in your HBase cluster that will use region replicas.

PropertyExample valueDescription

hbase.regionserver. storefile.refresh.period

30000

Specifies the period (in milliseconds) for refreshing the store files for secondary regions. The default value is 0, which indicates that the feature is disabled. Secondary regions receive new files from the primary region after the secondary replica refreshes the list of files in the region.

Note: Too-frequent refreshes might cause extra Namenode pressure. If files cannot be refreshed for longer than HFile TTL, specified with hbase.master.hfilecleaner.ttl, the requests are rejected.

Refresh period should be a non-zero number if META replicas are enabled (see hbase.meta.replica.count).

If you specify refresh period, we recommend configuring HFile TTL to a larger value than its default.

hbase.region.replica. replication.enabled

true

Determines whether asynchronous WAL replication is enabled or not. The value can be true or false. The default is false.

If this property is enabled, a replication peer named region_replica_replication is created. The replication peer replicates changes to region replicas for any tables that have region replication set to 1 or more.

After enabling this property, disabling it requires setting it to false and disabling the replication peer using the shell or the ReplicationAdmin java class. When replication is explicitly disabled and then re-enabled, you must set hbase.replication to true.

hbase.master. hfilecleaner.ttl

3600000

Specifies the period (in milliseconds) to keep store files in the archive folder before deleting them from the file system.

hbase.master. loadbalancer.class

org.apache.hadoop.hbase. master.balancer. StochasticLoadBalancer

Specifies the Java class used for balancing the load of all HBase clients.

The default value is org.apache.hadoop.hbase. master.balancer. StochasticLoadBalancer, which is the only load balancer that supports reading data from Region Servers in secondary mode.

hbase.meta.replica.count

3

Region replication count for the meta regions. The default value is 1.

hbase.regionserver. meta.storefile.refresh.period

30000

Specifies the period in milliseconds for refreshing the store files for the HBase META tables secondary regions. If this is set to 0, the feature is disabled.

When the secondary region refreshes the list of files in the region, the secondary regions see new files that are flushed and compacted from the primary region. There is no notification mechanism.

Note: If the secondary region is refreshed too frequently, it may cause Namenode pressure. Requests are rejected if the files cannot be refreshed for longer than HFile TTL, which is specified with hbase.master.hfilecleaner.ttl. Configuring HFile TTL to a larger value is recommended with this setting.

If META replicas are enabled, set this to a non-zero number by setting hbase.meta.replica.count to a value greater than 1.

hbase.region.replica.wait. for.primary.flush

true

Specifies whether to wait for a full flush cycle from the primary before starting to serve data in a secondary replica.

Disabling this feature might cause secondary replicas to read stale data when a region is transitioning to another region server.

hbase.region.replica. storefile.refresh. memstore.multiplier

4

Multiplier for a “store file refresh” operation for the secondary region replica.

This multiplier is used to refresh a secondary region instead of flushing a primary region. The default value (4) configures the file refresh so that the biggest secondary region replica is 4 times bigger than the biggest primary region.

Disabling this feature is not recommended. However, if you want to do so, set this property to a large value.

The following table lists client-side properties. Set these properties for all clients (applications) and servers (in your HBase cluster) that will use region replicas.

PropertyExample valueDescription

hbase.ipc.client. specificThreadForWriting

true

Specifies whether to enable interruption of RPC threads at the client side. This is required for region replicas with fallback RPC’s to secondary regions.

hbase.client. primaryCallTimeout.get

10000

Specifies the timeout (in microseconds) before secondary fallback RPC’s are submitted for get requests with Consistency.TIMELINE to the secondary replicas of the regions. The default value is 10ms.

Setting this to a smaller value increases the number of RPC’s, but lowers 99th-percentile latencies.

hbase.client. primaryCallTimeout. multiget

10000

Specifies the timeout (in microseconds) before secondary fallback RPC’s are submitted for multi-get requests (HTable.get(List<Get>)) with Consistency.TIMELINE to the secondary replicas of the regions. The default value is 10ms.

Setting this to a smaller value increases the number of RPC’s, but lowers 99th-percentile latencies.

hbase.client. primaryCallTimeout.scan

1000000

Specifies the timeout (in microseconds) before secondary fallback RPC’s are submitted for scan requests with Consistency.TIMELINE to the secondary replicas of the regions. The default value is 1 second.

Setting this to a smaller value increases the number of RPC’s, but lowers 99th-percentile latencies.

hbase.meta.replicas.use

true

Specifies whether to use META table replicas or not. The default value is false.