Accessing Data from Apache HBase

This section demonstrates how to use the HappyBase Python library to access data from HBase.

Load Data into HBase Table

For this example, we're going to import data from a CSV file into HBase using the importTsv package.

  1. Log into Cloudera Machine Learning and launch a Python 3 session within a new/existing project.
  2. For this example, we will be using the following sample CSV file. Create the following employees.csv file in your project.
    employees.csv
    1,Lucy,Engineering
    2,Milton,Engineering
    3,Edith,Support
  3. In the workbench, click Terminal access. Perform the following steps in the Terminal:
    1. Start the HBase shell and create a new blank table called employees.
      hbase shell
      create 'employees', 'name', 'department'
      exit
    2. Load employees.csv into HDFS.
      hdfs dfs -put employees.csv /tmp
    3. Use ImportTsv to load data from HDFS (/tmp/employees.csv) into the HBase table created in the previous step.
      hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,name,department employees /tmp/employees.csv
    4. Go back to the HBase shell and run the following command to make sure data was loaded into the HBase table.
      hbase shell
      scan 'employees'

Query Data Using HappyBase

  1. Launch a Python 3 session and use the workbench command prompt to install the happybase package.
    !pip3 install happybase
  2. Use happybase to connect to the employees table created in the previous step.

    Python
    import happybase
    connection = happybase.Connection(host='<hbase_thrift_server_hostname>', port=9090, autoconnect=True)
    table = connection.table('employees')
    rows = table.rows(['1','2','3'])
    for key, data in rows:
        print(key, data)