Medium Object (MOB) Storage Support in Apache HBase
An HBase table becomes less efficient once any cell in the table exceeds 100 KB of data. Objects exceeding 100 KB are common when you store images and large documents, such as email attachments, in HBase tables. But you can configure Hortonworks Data Platform (HDP) HBase to support tables with cells that have medium-size objects, also known as medium objects or more commonly as MOBs, to minimize the performance impact that objects over 100 KB can cause. MOB support operates by storing a reference of the object data within the main table. The reference in the table points toward external HFiles that contain the actual data, which can be on disk or in HDFS.
To enable MOB storage support for a table column family, you can choose one of two methods. One way is to run the table create command or the table alter command with MOB options in the HBase shell. Alternatively, you can set MOB parameters in a Java API.
Enabling MOB Storage Support
You can enable MOB storage support and configure the MOB threshold by using one of two different methods. If you do not specify a MOB size threshold, the default value of 100 KB is used.
Tip | |
---|---|
While HBase enforces no maximum-size limit for a MOB column, generally the best practice for optimal performance is to limit the data size of each cell to 10 MB. |
Prerequisites:
hbase
superuser privilegesHFile version 3, which is the default format of HBase 0.98+.
Method 1: Configure options in the command line
Run the table create command or the table alter command and do the following:
Set the
IS_MOB
option totrue
.Set the
MOB_THRESHOLD
option to the number of bytes for the threshold size above which an object is treated as a medium-size object.
Following are a couple of HBase shell command examples:
hbase> create 't1', {IMAGE_DATA => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
hbase> alter 't1', {IMAGE_DATA => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
Method 2: Invoke MOB support parameters in a Java API
You can use the following parameters in a Java API to enable and configure MOB storage support.
The second parameter (hcd.setMobThreshold
) is optional.
If you invoke the MOB threshold parameter, substitute bytes
with the value for the number of bytes for the threshold size at which an object is treated as
a medium-size object. If you omit the parameter when you enable MOB storage, the threshold value
defaults to 102400 (100 KB).
hcd.setMobEnabled(true);
hcd.setMobThreshold(
bytes
);
Following is a Java API example:
HColumnDescriptor hcd = new HColumnDescriptor(“f”); hcd.setMobEnabled(true); hcd.setMobThreshold(102400L);
Testing the MOB Storage Support Configuration
Run the org.apache.hadoop.hbase.IntegrationTestIngestWithMOB utility to test the MOB storage configuration. Values in the command options are expressed in bytes.
Following is an example that uses default values (in bytes):
$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestWithMOB \ -threshold 1024 \ -minMobDataSize 512 \ -maxMobDataSize threshold * 5 \
Tuning MOB Storage Cache Properties
Opening a MOB file places corresponding HFile-formatted data in active memory. Too many open MOB files can cause a RegionServer to exceed the memory capacity and cause performance degradation. To minimize the possibility of this issue arising on a RegionServer, you might need to tune the MOB file reader cache to an appropriate size so that HBase scales appropriately.
The MOB file reader cache is a least recently used (LRU) cache that keeps only the most recently used MOB files open. Refer to the MOB Cache Properties table for variables that can be tuned in the cache. MOB file reader cache configuration is specific to each RegionServer, so assess and change, if needed, each RegionServer individually. You can use either one of the two following methods.
Method 1: Enter property settings using Ambari
In Ambari select
Advanced
tab >Custom HBase-Site
>Add Property
.Enter a MOB cache property in the Type field.
Complete the Value field with your configuration setting.
Method 2: Enter property settings directly in the
hbase-site.xml
file
Open the RegionServer’s
hbase-site.xml
file. The file is usually located under/etc/hbase/conf
.Add the MOB cache properties to the RegionServer’s
hbase-site.xml
file.Adjust the parameters or use the default settings.
Initiate a restart or rolling restart of the RegionServer. For more information about rolling restarts, see the Rolling Restart section of the online Apache HBase Reference Guide.
Table 5.2. MOB Cache Properties
Property and Default Value | Description |
---|---|
hbase.mob.file.cache.size Default Value: 1000 |
Number of opened file handlers to cache. A larger value enhances read operations by providing more file handlers per MOB file cache and reduce frequent file opening and closing. However, if the value is set too high, a "too many opened file handers" condition can occur. |
hbase.mob.cache.evict.period Default Value: 3600 | The amount of time (in seconds) after which an unused file is evicted from the MOB cache. |
hbase.mob.cache.evict.remain.ratio Default Value: 0.5f | A multiplier (between 0.0 and 1.0) that determines how many files remain cached after the hbase.mob.file.cache.size property threshold is reached. The default value is 0.5f, which indicates that half the files (the least-recently used ones) are evicted. |