Configuration Settings for HBase
This section contains information on configuring the Linux host and HDFS for HBase.
- Using DNS with HBase
- Using the Network Time Protocol (NTP) with HBase
- Setting User Limits for HBase
- Using dfs.datanode.max.transfer.threads with HBase
- Configuring BucketCache in HBase
- Configuring Encryption in HBase
- Configure HBase Cell Level TTL
- Using Hedged Reads
- Accessing HBase by using the HBase Shell
- HBase Online Merge
- Troubleshooting HBase
- Configuring the BlockCache
Using DNS with HBase
HBase uses the local hostname to report its IP address. Both forward and reverse DNS resolving should work. If your server has multiple interfaces, HBase uses the interface that the primary hostname resolves to. If this is insufficient, you can set hbase.regionserver.dns.interface in the hbase-site.xml file to indicate the primary interface. To work properly, this setting requires that your cluster configuration is consistent and every host has the same network interface configuration. As an alternative, you can set hbase.regionserver.dns.nameserver in the hbase-site.xml file to use a different DNS name server than the system-wide default.
Using the Network Time Protocol (NTP) with HBase
The clocks on cluster members must be synchronized for your cluster to function correctly. Some skew is tolerable, but excessive skew could generate odd behaviors. Run NTP or another clock synchronization mechanism on your cluster. If you experience problems querying data or unusual cluster operations, verify the system time. For more information about NTP, see the NTP website.
Setting User Limits for HBase
Because HBase is a database, it opens many files at the same time. The default setting of 1024 for the maximum number of open files on most Unix-like systems is insufficient. Any significant amount of loading will result in failures and cause error message such as java.io.IOException...(Too many open files) to be logged in the HBase or HDFS log files. For more information about this issue, see the Apache HBase Book. You may also notice errors such as:
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
Another setting you should configure is the number of processes a user is permitted to start. The default number of processes is typically 1024. Consider raising this value if you experience OutOfMemoryException errors.
Configuring ulimit for HBase Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Go to the HBase service.
- Click the Configuration tab.
- Select or .
- Locate the Maximum Process File Descriptors property or search for it by typing its name in the Search box.
- Edit the property value.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties.
- Click Save Changes to commit the changes.
- Restart the role.
- Restart the service.
Configuring ulimit for HBase Using the Command Line
To change the maximum number of open files for a user, use the ulimit -n command while logged in as that user.
To set the maximum number of processes a user can start, use the ulimit -u command. You can also use the ulimit command to set many other limits. For more information, see the online documentation for your operating system, or the output of the man ulimit command.
To make the changes persistent, add the command to the user's Bash initialization file (typically ~/.bash_profile or ~/.bashrc ). Alternatively, you can configure the settings in the Pluggable Authentication Module (PAM) configuration files if your operating system uses PAM and includes the pam_limits.so shared library.
Configuring ulimit using Pluggable Authentication Modules Using the Command Line
- In the /etc/security/limits.conf file, add the following lines, adjusting the values as appropriate. This assumes that your HDFS user is called hdfs and your HBase user is called hbase.
hdfs - nofile 32768 hdfs - nproc 2048 hbase - nofile 32768 hbase - nproc 2048
To apply the changes in /etc/security/limits.conf on Ubuntu and Debian systems, add the following line in the /etc/pam.d/common-session file:
session required pam_limits.so
For more information on the ulimit command or per-user operating system limits, refer to the documentation for your operating system.
Using dfs.datanode.max.transfer.threads with HBase
A Hadoop HDFS DataNode has an upper bound on the number of files that it can serve at any one time. The upper bound is controlled by the dfs.datanode.max.transfer.threads property (the property is spelled in the code exactly as shown here). Before loading, make sure you have configured the value for dfs.datanode.max.transfer.threads in the conf/hdfs-site.xml file (by default found in /etc/hadoop/conf/hdfs-site.xml) to at least 4096 as shown below:
<property> <name>dfs.datanode.max.transfer.threads</name> <value>4096</value> </property>
Restart HDFS after changing the value for dfs.datanode.max.transfer.threads. If the value is not set to an appropriate value, strange failures can occur and an error message about exceeding the number of transfer threads will be added to the DataNode logs. Other error messages about missing blocks are also logged, such as:
06/12/14 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
Configuring BucketCache in HBase
The default BlockCache implementation in HBase is CombinedBlockCache, and the default off-heap BlockCache is BucketCache. SlabCache is now deprecated. See Configuring the HBase BlockCache for information about configuring the BlockCache using Cloudera Manager or the command line.
Configuring Encryption in HBase
It is possible to encrypt the HBase root directory within HDFS, using HDFS Data At Rest Encryption. This provides an additional layer of protection in case the HDFS filesystem is compromised.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): /tmp/output/f/5237a8430561409bb641507f0c531448 can't be moved into an encryption zone.
You can also choose to only encrypt specific column families, which encrypts individual HFiles while leaving others unencrypted, using HBase Transparent Encryption at Rest. This provides a balance of data security and performance.
Configure HBase Cell Level TTL
Cell TTLs are defined internally as Cell Tags. Cell Tags are only supported for HFile Version 3 and higher, therefore HFile Version 3 must be set to enable Cell TTL use. For more information, see Enabling HFile Version 3 Using Clouder Manager.
Using Hedged Reads
Accessing HBase by using the HBase Shell
After you have started HBase, you can access the database in an interactive way by using the HBase Shell, which is a command interpreter for HBase which is written in Ruby. Always run HBase administrative commands such as the HBase Shell, hbck, or bulk-load commands as the HBase user (typically hbase).
$ hbase shell
Setting Virtual Machine Options for HBase Shell
HBase in CDH 5.2 and newer allows you to set variables for the virtual machine running HBase Shell, by using the HBASE_SHELL_OPTS environment variable. This example sets several options in the virtual machine.
$ HBASE_SHELL_OPTS="-verbose:gc -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:$HBASE_HOME/logs/gc-hbase.log" ./bin/hbase shell
Scripting with HBase Shell
A new feature of HBase Shell in CDH 5.2 is non-interactive mode. This mode allows you to use HBase Shell in scripts, and allow the script to access the exit status of the HBase Shell commands. To invoke non-interactive mode, use the -n or --non-interactive switch. This small example script shows how to use HBase Shell in a Bash script.
#!/bin/bash echo 'list' | hbase shell -n status=$? if [$status -ne 0]; then echo "The command may have failed." fi
Successful HBase Shell commands return an exit status of 0. However, an exit status other than 0 does not necessarily indicate a failure, but should be interpreted as unknown. For example, a command may succeed, but while waiting for the response, the client may lose connectivity. In that case, the client has no way to know the outcome of the command. In the case of a non-zero exit status, your script should check to be sure the command actually failed before taking further action.
You can also write Ruby scripts for use with HBase Shell. Example Ruby scripts are included in the hbase-examples/src/main/ruby/ directory.
HBase Online Merge
CDH 5 supports online merging of regions. HBase splits big regions automatically but does not support merging small regions automatically. To complete an online merge of two regions of a table, use the HBase shell to issue the online merge command. By default, both regions to be merged should be neighbors; that is, one end key of a region should be the start key of the other region. Although you can "force merge" any two regions of the same table, this can create overlaps and is not recommended.
The Master and RegionServer both participate in online merges. When the request to merge is sent to the Master, the Master moves the regions to be merged to the same RegionServer, usually the one where the region with the higher load resides. The Master then requests the RegionServer to merge the two regions. The RegionServer processes this request locally. Once the two regions are merged, the new region will be online and available for server requests, and the old regions are taken offline.
For merging two consecutive regions use the following command:
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'
For merging regions that are not adjacent, passing true as the third parameter forces the merge.
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true