Understanding Bulk Loading
A common pattern in HBase to obtain high rates of data throughput on the write path is to use “bulk loading”. This generates HBase files (HFiles) that have a specific format instead of shipping edits to HBase RegionServers. The Hive integration has the ability to generate HFiles, which can be enabled by setting the property “hive.hbase.generatehfiles” to true, for example, `set hive.hbase.generatehfiles=true`. Additionally, the path to a directory which to write the HFiles must also be provided, for example,`set hfile.family.path=/tmp/hfiles”.
After the Hive query finishes, you must execute the “completebulkload” action in HBase to bring the files “online” in your HBase table. For example, to finish the bulk load for files in “/tmp/hfiles” for the table “hive_data”, you might run on the command-line:
$ hbase completebulkload /tmp/hfiles hive_data