HBase Storefile Tracking

In Cloudera Runtime 7.2.14, a new feature called “Storefile Tracking” (SFT) is available as an optional feature delivered through the Cloudera Operational Database (COD) service.

Cloudera has worked with the Apache HBase project to deliver the first version of this feature through HBASE-26067, and has delivered this feature as a part of CDP.

When using S3 for HBase data, COD can dynamically scale the number of workers based on the compute resources required, rather than the workers required to host the data in HDFS. To deliver this ability to you in a reasonable timeframe, Cloudera built HBOSS. This feature is the next evolution of HBase using S3 which no longer requires the HBOSS solution. The storefile tracking feature for HBase with S3 prevents unwanted I/O due to renames on S3. With HDFS, a rename is a constant-time operation, but on S3 a rename requires a full copy of the file. Because of this, using S3 doubles the I/O costs for HBase operations like compactions, flushes, and snapshot-based operations. The storefile tracking feature removes the reliance of renames for S3-backed HBase data which should make S3 function more like HDFS does.