Chapter 5. Using Apache HBase and Apache Phoenix
Hortonworks Data Platform (HDP) deploys Apache HBase as a NoSQL database for your Hadoop cluster. HBase scales linearly to handle very large (petabyte scale), column-oriented data sets. The data store is predicated on a key-value model that supports low latency reads, writes, and updates in a distributed environment.
As a natively non-relational database, HBase can combine data sources that use a wide variety of different structures and schemas. It is natively integrated with HDFS for resilient data storage and is designed for hosting very large tables with sparse data.
HDP support also includes Apache Phoenix, a SQL abstraction layer for interacting with HBase. Phoenix lets you create and interact with tables in the form of typical DDL/DML statements via its standard JDBC API. For more information, see the Apache Phoenix website.
Supported JDBC client drivers can be obtained from the
/usr/hdp/current/phoenix-client/phoenix-client.jar
file on one of your
cluster’s edge nodes or in the Hortonworks Phoenix server-client repository . If you use the repository,
download the JAR file corresponding to your installed HDP version.
HBase Installation and Setup
You can install and configure HBase for your HDP cluster by either of the following methods:
Ambari Install Wizard: The wizard is the part of the Ambari web-based platform that guides HDP installation, including deploying the various Hadoop components such as HBase for the needs of your cluster. See the Ambari Install Guide.
Manual Installation: You can fetch one of the repositories bundled with HBase and install it on the command line. See the Non-Ambari Installation Guide.
Enabling Phoenix
To enable Phoenix:
Open Ambari.
Select Services tab > HBase > Configs tab.
Scroll down to the Phoenix SQL settings.
(Optional) Reset the Phoenix Query Timeout.
Click the Enable Phoenix slider button.
Cell-level Access Control Lists (ACLs)
Cell-level access control lists for HBase tables are supported in HBase 0.98 and later.
Note | |
---|---|
This feature is a technical preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact support by logging a case on the Hortonworks Support Portal. |
Column Family Encryption
Column family encryption is supported in HBase 0.98 and later.
Note | |
---|---|
This feature is a technical preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact support by logging a case on the Hortonworks Support Portal. |
Tuning RegionServers
To tune garbage collection (GC) in HBase RegionServers for stability, make the following configuration changes:
Specify the following configurations in the
HBASE_REGIONSERVER_OPTS
configuration option in the/conf/hbase-env.sh
file:-XX:+UseConcMarkSweepGC -Xmn2500m (depends on MAX HEAP SIZE, but should not be less than 1g and more than 4g) -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps
Make sure that the block cache size and the memstore size combined do not significantly exceed
0.5*MAX_HEAP
, which is defined in theHBASE_HEAP_SIZE
configuration option of the/conf/hbase-env.sh
file.