Load Data into HBase Table
This section demonstrates how to use the HappyBase
Python library to
access data from HBase.
importTsv
package. -
Log into Cloudera Data Science Workbench and launch a Python 3 session within a
new/existing project.
For this example, we will be using the following sample CSV file.
-
Create the following
employees.csv
file in your project.employees.csv1,Lucy,Engineering 2,Milton,Engineering 3,Edith,Support
-
In the workbench, click Terminal access. Perform the
following steps in the Terminal:
-
Start the HBase shell and create a new blank table called
employees
.hbase shell create 'employees', 'name', 'department' exit
-
Load
employees.csv
into HDFS.hdfs dfs -put employees.csv /tmp
-
Use ImportTsv to load data from HDFS
(
/tmp/employees.csv
) into the HBase table created in the previous step.hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,name,department employees /tmp/employees.csv
-
Go back to the HBase shell and run the following command to make sure data
was loaded into the HBase table.
hbase shell scan 'employees'
-
Start the HBase shell and create a new blank table called