2. Using Pig to Bulk Load Data Into HBase

Use the following instructions to bulk load data into HBase using Pig:

Prepare the input file.

For example, consider the sample data.tsv file as shown below:

row1	c1	c2
row2	c1	c2
row3	c1	c2
row4	c1	c2
row5	c1	c2
row6	c1	c2
row7	c1	c2
row8	c1	c2
row9	c1	c2
row10    c1	c2

Make the data available on the cluster. Execute the following command on your HBase Server machine:
```
hadoop fs -put $filename /tmp/
```
Using the previous example:
```
hadoop fs -put data.tsv /tmp/
```

Create or register the HBase table in HCatalog. Execute the following command on your HBase Server machine:

hcat -f $HBase_Table_Name

For example, for a sample simple.ddl table as shown below:

CREATE TABLE
simple_hcat_load_table (id STRING, c1 STRING, c2 STRING)
STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
TBLPROPERTIES (
  'hbase.table.name' = 'simple_hcat_load_table',
  'hbase.columns.mapping' = 'd:c1,d:c2',
  'hcat.hbase.output.bulkMode' = 'true'
);

Execute the following command:

hcat -f simple.ddl

Create the import file. For example, create a file named simple.bulkload.pig with the following contents:
Note
This import file uses the data.tsv file and simple.ddl table created previously. Ensure that you modify the contents of this file according to your environment.
```
A = LOAD 'hdfs:///tmp/data.tsv' USING PigStorage('\t') AS (id:chararray, c1:chararray, c2:chararray);
-- DUMP A;
STORE A INTO 'simple_hcat_load_table' USING org.apache.hcatalog.pig.HCatStorer();
```
Use Pig to populate the HBase table via HCatalog bulkload.
Continuing with the previous example, execute the following command on your HBase Server machine:
```
pig -useHCatalog simple.bulkload.pig
```

	Note
This import file uses the `data.tsv` file and `simple.ddl` table created previously. Ensure that you modify the contents of this file according to your environment.

Legal notices