Use Sqoop
Sqoop can import records into a table in HBase. It has an out-of-the-box support for HBase
sqoop
import
command to import data into HBase using Sqoop:--hbase-table
: Specifies the name of the table in HBase to which you want to import your data.--column-family
: Specifies into which column family Sqoop imports the data of your tables.
For example, you can import the table cities into an already existing HBase table with the same name and use the column family name world:
sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities --hbase-table cities --column-family world
If the target table and column family do not exist, the Sqoop job will exit with an
error. You must create the target table and column family before running an import. If you
specify --hbase-create-table
, Sqoop creates the target table and column
family if they do not exist, using the default parameters from your HBase
configuration.
- By default, with the column name specified in the
--split-by
option - With the primary key of the table, if it is available
- With the
--hbase-row-key
parameter, which overrides both the--split-by
option and the primary key of the table
For more information on data insertion into HBase, see Sqoop User Guide.
Import NULL Column Updates into HBase
You can specify how Sqoop handles RDBMS table column updated to NULL during incremental import.
There are two modes for this, ignore and delete. You can specify the mode using the
--hbase-null-incrementel-mode
option:
-ignore
: This is the default value. If the source table's column is updated to NULL, the target HBase table will still show the previous value for that column.-delete
: If the source table's column is updated to NULL, all previous versions of the column will be deleted from HBase. When checking the column in HBase using the Java API, a null value will be displayed.
Examples:
Run an incremental import to an HBase table and ignore the columns which were updated to NULL in the relational database:
sqoop import --connect $CONN --username $USER --password $PASS --table "hbase_test" --hbase-table hbase_test --column-family data -m 1 --incremental lastmodified --check-column date_modified --last-value "2017-12-15 10:58:44.0" --merge-key id --hbase-null-incremental-mode ignore
Run an incremental import to an HBase table and delete all the versions of the columns which were updated to NULL in the relational database:
sqoop import --connect $CONN --username $USER --password $PASS --table "hbase_test" --hbase-table hbase_test --column-family data -m 1 --incremental lastmodified --check-column date_modified --last-value "2017-12-15 10:58:44.0" --merge-key id --hbase-null-incremental-mode delete