Use Sqoop

Sqoop can import records into a table in HBase. It has an out-of-the-box support for HBase

There are two mandatory options you must specify when using the sqoop import command to import data into HBase using Sqoop:
  • --hbase-table: Specifies the name of the table in HBase to which you want to import your data.
  • --column-family: Specifies into which column family Sqoop imports the data of your tables.

For example, you can import the table cities into an already existing HBase table with the same name and use the column family name world:

sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities --hbase-table cities --column-family world

If the target table and column family do not exist, the Sqoop job will exit with an error. You must create the target table and column family before running an import. If you specify --hbase-create-table, Sqoop creates the target table and column family if they do not exist, using the default parameters from your HBase configuration.

Sqoop needs to identify which RDBMS column is used as row key column in the HBase table. There are three ways to do this:
  • By default, with the column name specified in the --split-by option
  • With the primary key of the table, if it is available
  • With the --hbase-row-key parameter, which overrides both the --split-by option and the primary key of the table

For more information on data insertion into HBase, see Sqoop User Guide.

Import NULL Column Updates into HBase

You can specify how Sqoop handles RDBMS table column updated to NULL during incremental import.

There are two modes for this, ignore and delete. You can specify the mode using the --hbase-null-incrementel-mode option:

  • -ignore: This is the default value. If the source table's column is updated to NULL, the target HBase table will still show the previous value for that column.
  • -delete: If the source table's column is updated to NULL, all previous versions of the column will be deleted from HBase. When checking the column in HBase using the Java API, a null value will be displayed.

Examples:

Run an incremental import to an HBase table and ignore the columns which were updated to NULL in the relational database:

sqoop import --connect $CONN --username $USER --password $PASS --table "hbase_test" --hbase-table hbase_test --column-family data -m 1 --incremental lastmodified --check-column date_modified --last-value "2017-12-15 10:58:44.0" --merge-key id --hbase-null-incremental-mode ignore

Run an incremental import to an HBase table and delete all the versions of the columns which were updated to NULL in the relational database:


                sqoop import --connect $CONN --username $USER --password $PASS --table "hbase_test" --hbase-table hbase_test --column-family data -m 1 --incremental lastmodified --check-column date_modified --last-value "2017-12-15 10:58:44.0" --merge-key id --hbase-null-incremental-mode delete