Accessing Data from Apache HBase
This section demonstrates how to use the HappyBase
Python library to access data from HBase.
Load Data into HBase Table
For this example, we're going to import data from a CSV file into
HBase using the importTsv
package.
- Log into Cloudera Machine Learning and launch a Python 3 session within a new/existing project.
- For this example, we will be using the following sample CSV file.
Create the following
employees.csv
file in your project.employees.csv1,Lucy,Engineering 2,Milton,Engineering 3,Edith,Support
- In the workbench, click Terminal access.
Perform the following steps in the Terminal:
- Start the HBase shell and create a new blank table called
employees
.hbase shell create 'employees', 'name', 'department' exit
- Load
employees.csv
into HDFS.hdfs dfs -put employees.csv /tmp
- Use ImportTsv to load data
from HDFS (
/tmp/employees.csv
) into the HBase table created in the previous step.hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,name,department employees /tmp/employees.csv
- Go back to the HBase shell and run the following command to
make sure data was loaded into the HBase
table.
hbase shell scan 'employees'
- Start the HBase shell and create a new blank table called
Query Data Using HappyBase
- Launch a Python 3 session and use the workbench command prompt
to install the
happybase
package.!pip3 install happybase
-
Use
happybase
to connect to theemployees
table created in the previous step.Pythonimport happybase connection = happybase.Connection(host='<hbase_thrift_server_hostname>', port=9090, autoconnect=True) table = connection.table('employees') rows = table.rows(['1','2','3']) for key, data in rows: print(key, data)