Data Services
Also available as:
PDF
loading table of contents...

Chapter 5. Using Apache HBase and Apache Phoenix

Hortonworks Data Platform (HDP) deploys Apache HBase as a NoSQL database for your Hadoop cluster. HBase scales linearly to handle very large (petabyte scale), column-oriented data sets. The data store is predicated on a key-value model that supports low latency reads, writes, and updates in a distributed environment.

As a natively non-relational database, HBase can combine data sources that use a wide variety of different structures and schemas. It is natively integrated with HDFS for resilient data storage and is designed for hosting very large tables with sparse data.

HDP support also includes Apache Phoenix, a SQL abstraction layer for interacting with HBase. Phoenix lets you create and interact with tables in the form of typical DDL/DML statements via its standard JDBC API. For more information, see the Apache Phoenix website.

Supported JDBC client drivers can be obtained from the /usr/hdp/current/phoenix-client/phoenix-client.jar file on one of your cluster’s edge nodes or in the Hortonworks Phoenix server-client repository . If you use the repository, download the JAR file corresponding to your installed HDP version.

HBase Installation and Setup

You can install and configure HBase for your HDP cluster by either of the following methods:

  • Ambari Install Wizard: The wizard is the part of the Ambari web-based platform that guides HDP installation, including deploying the various Hadoop components such as HBase for the needs of your cluster. See the Ambari Install Guide.

  • Manual Installation: You can fetch one of the repositories bundled with HBase and install it on the command line. See the Non-Ambari Installation Guide.

Enabling Phoenix

To enable Phoenix:

  1. Open Ambari.

  2. Select Services tab > HBase > Configs tab.

  3. Scroll down to the Phoenix SQL settings.

  4. (Optional) Reset the Phoenix Query Timeout.

  5. Click the Enable Phoenix slider button.

Cell-level Access Control Lists (ACLs)

Cell-level access control lists for HBase tables are supported in HBase 0.98 and later.

[Note]Note

This feature is a technical preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact support by logging a case on the Hortonworks Support Portal.

Column Family Encryption

Column family encryption is supported in HBase 0.98 and later.

[Note]Note

This feature is a technical preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact support by logging a case on the Hortonworks Support Portal.

Tuning RegionServers

To tune garbage collection (GC) in HBase RegionServers for stability, make the following configuration changes:

  1. Specify the following configurations in the HBASE_REGIONSERVER_OPTS configuration option in the /conf/hbase-env.sh file:

    -XX:+UseConcMarkSweepGC
    -Xmn2500m (depends on MAX HEAP SIZE, but should not be less than 1g and more than 4g)
    -XX:PermSize=128m 
    -XX:MaxPermSize=128m 
    -XX:SurvivorRatio=4 
    -XX:CMSInitiatingOccupancyFraction=50 
    -XX:+UseCMSInitiatingOccupancyOnly 
    -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log 
    -XX:+PrintGCDetails
    -XX:+PrintGCDateStamps
  2. Make sure that the block cache size and the memstore size combined do not significantly exceed 0.5*MAX_HEAP, which is defined in the HBASE_HEAP_SIZE configuration option of the /conf/hbase-env.sh file.