3. Initial Configuration and Tuning

HBase administrators typically use the following methods to initially configure the cluster:

Increase the request handler thread count
Configure the size and number of WAL files
Configure compactions
Pre-split tables
Tune JVM garbage collection

Increase the Request Handler Thread Count

Administrators who expect their HBase cluster to experience a high volume request pattern should increase the number of listeners generated by the region servers. Use the hbase.regionserver.handler.count property in the hbase-site.xml configuration file to set the number higher than the default value of 30.

Configure the Size and Number of WAL Files

HBase uses the Write Ahead Log, or WAL, to recover memstore data not yet flushed to disk if a region server crashes. Administrators should configure these WAL files to be slightly smaller than the HDFS block size. By default, an HDFS block is 64 Mb and a WAL is approximately 60 Mb. Hortonworks recommends that administrators ensure that enough WAL files are allocated to contain the total capacity of the memstores. Use the following formula to determine the number of WAL files needed:

(regionserver_heap_size * memstore fraction) / (default_WAL_size)

For example, assume the following HBase cluster configuration:

16 GB RegionServer heap
0.4 memstore fraction
60 MB default WAL size

The formula for this configuration looks as follows:

(16384 MB * 0.4 / 60 MB = approximately 109 WAL files

Use the following properties in the hbase-site.xml configuration file to configure the size and number of WAL files:

Table 10.3. Authentication Schemes in TCP Transport Mode

Configuration Property	Description	Default
`hbase.regionserver.maxlogs`	Sets the maximum number of WAL files.	32
`hbase.regionserver.logroll.multiplier`	Multiplier of HDFS block size.	0.95
`hbase.regionserver.hlog.blocksize`	Optional override of HDFS block size.	Value assigned to actual HDFS block size.

	Tip
	If recovery from failure takes longer than expected, try reducing the number of WAL files to improve performance.

Configure Compactions

Administrators who expect their HBase clusters to host large amounts of data should consider the affect that compactions have on write throughput. For write-intensive data request patterns, administrators should consider less frequent compactions and more store files per region. Use the hbase.hstore.compaction.min property in the hbase-site.xml configuration file to increase the minimum number of files required to trigger a compaction. Administrators opting to increase this value should also increase the value assigned to the hbase.hstore.blockingStoreFiles property since more files will accumulate.

Pre-split Tables

Administrators can pre-split tables during table creation based on the target number of regions per region server to avoid costly dynamic splitting as the table starts to fill up. In addition, it ensures that the regions in the pre-split table are distributed across many host machines. Pre-splitting a table avoids the cost of compactions required to rewrite the data into separate physical files during automatic splitting. If a table is expected to grow very large, administrators should create at least one region per region server. However, do not immediately split the table into the total number of desired regions. Rather, choose a low to intermediate value. For multiple tables, do not create more than one region per region server, especially if you are uncertain how large the table will grow. Creating too many regions for a table that will never exceed 100 Mb in size isn't useful; a single region can adequately services a table of this size.

Configure the JVM Garbage Collector

A region server cannot utilize a very large heap due to the cost of garbage collection. Administrators should specify no more than 24 GB for one region server.

Legal notices