HBase Store File Tracking

Cloudera Operational Database (COD) supports Store File Tracking (SFT) as a separate, plugable layer to handle storefile life cycle, and includes the FILE based built-in implementation that avoids internal file rename or move operations while managing the storefiles.

Cloudera has worked with the Apache HBase project to deliver the first version of this feature through HBASE-26067, and has delivered this feature as a part of CDP.

When using S3 for HBase data, COD can dynamically scale the number of workers based on the compute resources required, rather than the workers required to host the data in HDFS. To deliver this ability to you in a reasonable timeframe, Cloudera built HBOSS. This feature is the next evolution of HBase using S3 which no longer requires the HBOSS solution. The SFT feature for HBase with S3 prevents unwanted I/O due to renames on S3. With HDFS, a rename is a constant-time operation, but on S3 a rename requires a full copy of the file. Because of this, using S3 doubles the I/O costs for HBase operations like compactions, flushes, and snapshot-based operations. The SFT feature removes the reliance on renames from HBase internal functions that handle user data, which benefits file systems that lack atomic directory rename operation, such as S3.

You can set SFT at the HBase service level using the hbase.store.file-tracker.impl property in hbase-site.xml file or at the table or column family level by configuring TableDescriptor.