HBase administrators typically use the following methods to initially configure the cluster:
Increase the request handler thread count
Configure the size and number of WAL files
Configure compactions
Pre-split tables
Tune JVM garbage collection
Increase the Request Handler Thread Count
Administrators who expect their HBase cluster to experience a high volume request
pattern should increase the number of listeners generated by the region servers. Use the
hbase.regionserver.handler.count
property in the
hbase-site.xml
configuration file to set the number higher than
the default value of 30
.
Configure the Size and Number of WAL Files
HBase uses the Write Ahead Log, or WAL, to recover memstore data not yet flushed to disk if a region server crashes. Administrators should configure these WAL files to be slightly smaller than the HDFS block size. By default, an HDFS block is 64 Mb and a WAL is approximately 60 Mb. Hortonworks recommends that administrators ensure that enough WAL files are allocated to contain the total capacity of the memstores. Use the following formula to determine the number of WAL files needed:
(regionserver_heap_size * memstore fraction) / (default_WAL_size)
For example, assume the following HBase cluster configuration:
16 GB RegionServer heap
0.4 memstore fraction
60 MB default WAL size
The formula for this configuration looks as follows:
(16384 MB * 0.4 / 60 MB = approximately 109 WAL files
Use the following properties in the hbase-site.xml
configuration
file to configure the size and number of WAL files:
Table 10.3. Authentication Schemes in TCP Transport Mode
Configuration Property | Description | Default |
---|---|---|
hbase.regionserver.maxlogs | Sets the maximum number of WAL files. | 32 |
hbase.regionserver.logroll.multiplier | Multiplier of HDFS block size. | 0.95 |
hbase.regionserver.hlog.blocksize | Optional override of HDFS block size. | Value assigned to actual HDFS block size. |
Tip | |
---|---|
If recovery from failure takes longer than expected, try reducing the number of WAL files to improve performance. |
Configure Compactions
Administrators who expect their HBase clusters to host large amounts of data should
consider the affect that compactions have on write throughput. For write-intensive data
request patterns, administrators should consider less frequent compactions and more
store files per region. Use the hbase.hstore.compaction.min
property in the
hbase-site.xml
configuration file to increase the minimum
number of files required to trigger a compaction. Administrators opting to increase this
value should also increase the value assigned to the
hbase.hstore.blockingStoreFiles
property since more files will
accumulate.
Pre-split Tables
Administrators can pre-split tables during table creation based on the target number of regions per region server to avoid costly dynamic splitting as the table starts to fill up. In addition, it ensures that the regions in the pre-split table are distributed across many host machines. Pre-splitting a table avoids the cost of compactions required to rewrite the data into separate physical files during automatic splitting. If a table is expected to grow very large, administrators should create at least one region per region server. However, do not immediately split the table into the total number of desired regions. Rather, choose a low to intermediate value. For multiple tables, do not create more than one region per region server, especially if you are uncertain how large the table will grow. Creating too many regions for a table that will never exceed 100 Mb in size isn't useful; a single region can adequately services a table of this size.
Configure the JVM Garbage Collector
A region server cannot utilize a very large heap due to the cost of garbage collection. Administrators should specify no more than 24 GB for one region server.