The physical size of data on disk is affected by the following factors:
Factor Affecting Size of Physical Data |
Description |
---|---|
HBase Overhead |
The default amount of disk space required for a single HBase table cell.
Smaller table cells require less overhead. The minimum cell size is 24 bytes and
the default maximum is 10485760 bytes. Administrators can customize the maximum
cell size with the |
Compression |
Choose a data compression tool that makes sense for your data to reduce the physical size of data on disk. Unfortunately, HBase cannot ship with LZO due to licensing issues. However, HBase administrators may install LZO after installing HBase. GZIP provides better compression than LZO but is slower. HBase also supports Snappy. |
HDFS Replication |
HBase uses HDFS for storage, so replicating HBase data stored in HDFS affects the total physical size of data. A typical replication factor of 3 for all HBase tables in a cluster would triple the physical size of the stored data. |
RegionServer Write Ahead Log (WAL) |
The size of the Write Ahead Log, or WAL, for each RegionServer has minimal impact on the physical size of data. The size of the WAL is typically fixed at less than half of the memory for the region server. Although this factor is included here for completeness, the impact of the WAL on data size is negligible and its size is usually not configured. |