HBase includes several methods of loading data into tables. Various methods exist for loading data from relational format into non-relational format.
The most straightforward method is to either use the TableOutputFormat
class
from a MapReduce job, or use the normal client APIs; however, these are not always the most
efficient methods because these APIs cannot handle bulk loading.
Bulk Importing bypasses the HBase API and writes contents, which are properly formatted as HBase data files – HFiles, directly to the file system. Analyzing HBase data with MapReduce requires custom coding.
Using bulk load will use less CPU and network resources than simply using the HBase API.
ImportTsv
is a custom MapReduce application that will load data in Tab
Separated Value TSV
format into HBase.
The following discusses typical use cases for bulk loading data into HBase:
HBase can act as ETL data sink
HBase can be used as data source
Bulk load workflows generate HFiles offline and have two distinct stages:
Use either
ImportTsv
orimport
utilities or write a custom application to generate HFiles from Hive/Pig.Use
completebulkload
to load the HFiles to HDFS
Note | |
---|---|
By default, the bulk loader class |