What's New in Apache HBase

Learn about the new features of HBase in Cloudera Runtime 7.2.18.

HBase supports load balancing using a cache-aware load balancer

The HBase balancer now supports the cache-aware load balancer that enhances the capability of HBase to enable the balancer to consider the cache allocation of each region on region servers while calculating a new assignment plan. This balancer also uses the region or region server cache allocation information reported by the region servers to calculate the percentage of HFiles cached for each region on the hosting server, and then use that as an additional factor while deciding an optimal new assignment plan.

HBase supports Snappy with /tmp directory mounted with noexec option

In Cloudera Manager, the Snappy temporary directory configuration item is added to HBase Master and HBase RegionServer to allow Snappy compression when /tmp directory is mounted with noexec option.

HBase supports Netty native libraries with /tmp directory mounted with noexec option

In Cloudera Manager, the Netty native library working directory configuration item is added to HBase Master and HBase RegionServer to support HBase with /tmp directory mounted with noexec option.

HBase shows cached percentage for region data on RegionServer UI

An important feature for Cloudera Operational Database (COD) over S3 with ephemeral cache is the process of warming up the cache at region opening (also known as cache prefetch). The goal is to load the most of the dataset before any client reads, so that a reduced latency and optimal performance can be achieved for the application requests. This prefetch process takes several hours on very large datasets, and the operators might want to monitor the progress of this cache loading. To handle this, HBase has introduced new metrics about the percentage of individual regions data currently cached, and it also added this information to the Storefile Metrics tab in the Regions section of the RegionServer UI.

Related Apache JIRA: HBASE-28246


HBase supports disabling the caching for the individual column families

In some use cases, not all tables in the dataset have the same SLA requirements. If the total cache capacity is much smaller than the whole dataset, an alternative is to restrict the cache usage by the tables with critical response times. In HBase, you can now implement this by disabling the cache on individual column families.

On an hbase shell, perform the following alter command for each column family that does not require caching.

alter 'NAMESPACE:TABLENAME', {NAME=>'CF_NAME', BLOCKCACHE => 'false'}