Load Data into HBase Table

This section demonstrates how to use the HappyBase Python library to access data from HBase.

For this example, we're going to import data from a CSV file into HBase using the importTsv package.
  1. Log into Cloudera Data Science Workbench and launch a Python 3 session within a new/existing project.
    For this example, we will be using the following sample CSV file.
  2. Create the following employees.csv file in your project.
    employees.csv
    1,Lucy,Engineering
    2,Milton,Engineering
    3,Edith,Support
  3. In the workbench, click Terminal access. Perform the following steps in the Terminal:
    1. Start the HBase shell and create a new blank table called employees.
      hbase shell
      create 'employees', 'name', 'department'
      exit
    2. Load employees.csv into HDFS.
      hdfs dfs -put employees.csv /tmp
    3. Use ImportTsv to load data from HDFS (/tmp/employees.csv) into the HBase table created in the previous step.
      hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,name,department employees /tmp/employees.csv
    4. Go back to the HBase shell and run the following command to make sure data was loaded into the HBase table.
      hbase shell
      scan 'employees'