Use the following instructions to bulk load data into HBase using Pig:
Prepare the input file.
For example, consider the sample
data.tsv
file as shown below:row1 c1 c2 row2 c1 c2 row3 c1 c2 row4 c1 c2 row5 c1 c2 row6 c1 c2 row7 c1 c2 row8 c1 c2 row9 c1 c2 row10 c1 c2
Make the data available on the cluster. Execute the following command on your HBase Server machine:
hadoop fs -put $filename /tmp/
Using the previous example:
hadoop fs -put data.tsv /tmp/
Create or register the HBase table in HCatalog. Execute the following command on your HBase Server machine:
hcat -f $HBase_Table_Name
For example, for a sample
simple.ddl
table as shown below:CREATE TABLE simple_hcat_load_table (id STRING, c1 STRING, c2 STRING) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES ( 'hbase.table.name' = 'simple_hcat_load_table', 'hbase.columns.mapping' = 'd:c1,d:c2', 'hcat.hbase.output.bulkMode' = 'true' );
Execute the following command:
hcat -f simple.ddl
Create the import file. For example, create a file named
simple.bulkload.pig
with the following contents:Note This import file uses the
data.tsv
file andsimple.ddl
table created previously. Ensure that you modify the contents of this file according to your environment.A = LOAD 'hdfs:///tmp/data.tsv' USING PigStorage('\t') AS (id:chararray, c1:chararray, c2:chararray); -- DUMP A; STORE A INTO 'simple_hcat_load_table' USING org.apache.hcatalog.pig.HCatStorer();
Use Pig to populate the HBase table via HCatalog bulkload.
Continuing with the previous example, execute the following command on your HBase Server machine:
pig -useHCatalog simple.bulkload.pig