Connecting Cloudera Data Engineering to Apache HBase

Apache HBase integrates closely with Cloudera Data Engineering to enable real-time, low-latency, and random read or write operations within cloud-native big data pipelines. Connect to HBase using Cloudera Data Engineering to interact with HBase tables on the base cluster.

You must download the following JAR files for compiling Scala application:
- hbase-shaded-client-[***HBASE-CLOUDERA-RUNTIME-VERSION***].jar
- opentelemetry-api-0.12.0.jar
- opentelemetry-context-0.12.0.jar
You must get the hbase-site.xml file from the base cluster.
1. Open your terminal and log into the HBase Gateway node using your SSH credentials.
2. Change your current working directory to /etc/hbase/conf and list the files in the directory.
```
$ cd /etc/hbase/conf
$ ls
atlas-application.properties  __cloudera_metadata__  hbase-env.sh    hdfs-site.xml  log4j.properties  ssl-client.xml
__cloudera_generation__       core-site.xml         hbase-site.xml  jaas.conf        ozone-site.xml
```
3. Copy the hbase-site.xml file.
You must provide read, write, run, and create permissions to the workload user for the Hbase table from the Ranger UI. For more information, see Configure a resource-based policy: HBase.
important
If the table is already created and the Ranger UI fails to load tables with a resource lookup failed error during policy creation, edit the cm_hbase service and fill the hbase.master.kerberos.principal configuration value from the hbase-site.xml file.

Create a new project with the files for the Cloudera Data Engineering jobs for HBase.
Add the downloaded JAR files to the lib directory and build the project.
note
You must run the sbt compile command on the sample application code.
Example
```
$ ls lib/
hbase-shaded-client-2.4.17.7.1.9.1064-1.jar   opentelemetry-api-0.12.0.jar opentelemetry-context-0.12.0.jar

$ sbt compile && sbt package
```
The Cloudera Data Engineering job JAR files are present in the target/scala-[***SCALA-VERSION***] directory.
In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page is displayed.
Click Resources in the left navigation menu. The Resources page is displayed.
Click the Create Resource button.
In the Name filed, enter hbase-resources and click Save.
Go to the Details tab.
Click the Upload Files button and upload the following files:
- The downloaded JAR files
- The hbase-site.xml file
- The Cloudera Data Engineering job JAR files
Create a new job named hbase-table. For instructions about creating jobs, see Creating jobs in Cloudera Data Engineering.
1. Create a Cloudera Data Engineering job with the following details and attach the hbase-site.xml file in the resources:
  - Name: hbase-table
  - Application File: Upload the hbase-table_[***SCALA-VERSION***].jar file.
  - Main Class: HbaseTable
  - Spark Configurations:
    Example
```
spark.driver.extraClassPath=/app/mount/
spark.executor.extraClassPath=/app/mount/
```
  - Files and Resources: Upload the hbase-site.xml file and the downloaded JAR files.
2. Click Create and Run.
In the Job Runs, verify if the hbase-table job passes.
Validate if the HBase table is created.

Open your terminal and SSH into the Gateway node for HBase from the base cluster and run the following command:

$ hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.4.17.7.1.9.1059-4, rba2d63e3c506a96a0e0e61dd1e669b0f8240c629, Wed Sep  3 08:25:31 UTC 2025
Took 0.0026 seconds
hbase:001:0> list
TABLE
ATLAS_ENTITY_AUDIT_EVENTS
SYSTEM.CATALOG
SYSTEM.CHILD_LINK
SYSTEM.FUNCTION
SYSTEM.LOG
SYSTEM.MUTEX
SYSTEM.SEQUENCE
SYSTEM.STATS
SYSTEM.TASK
atlas_janus
hbase_table
users
13 row(s)
Took 0.6749 seconds
=> ["ATLAS_ENTITY_AUDIT_EVENTS", "SYSTEM.CATALOG", "SYSTEM.CHILD_LINK", "SYSTEM.FUNCTION", "SYSTEM.LOG", "SYSTEM.MUTEX", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "SYSTEM.TASK", "atlas_janus", "hbase_table", "test", "users"]
hbase:002:0> describe 'hbase_table'
Table hbase_table is ENABLED
hbase_table, {TABLE_ATTRIBUTES => {METADATA => {'hbase.store.file-tracker.impl' => 'DEFAULT'}}}
COLUMN FAMILIES DESCRIPTION
{NAME => 'column', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL =>
 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

1 row(s)
Quota is disabled
Took 0.2162 seconds
hbase:003:0> scan 'hbase_table'
ROW                                         COLUMN+CELL
 row0                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=0
 row1                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=1
 row2                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=2
 row3                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=3
 row4                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=4
 row5                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=5
 row6                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=6
 row7                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=7
 row8                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=8
 row9                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=9
10 row(s)
Took 0.1657 seconds