Connecting Cloudera Data Engineering to Apache HBase

Apache HBase integrates closely with Cloudera Data Engineering to enable real-time, low-latency, and random read or write operations within cloud-native big data pipelines. Connect to HBase using Cloudera Data Engineering to interact with HBase tables on the base cluster.

  1. You must download the following JAR files for compiling Scala application:
    • hbase-shaded-client-[***HBASE-CLOUDERA-RUNTIME-VERSION***].jar
    • opentelemetry-api-0.12.0.jar
    • opentelemetry-context-0.12.0.jar
  2. You must get the hbase-site.xml file from the base cluster.
    1. Open your terminal and log into the HBase Gateway node using your SSH credentials.
    2. Change your current working directory to /etc/hbase/conf and list the files in the directory.
      $ cd /etc/hbase/conf
      $ ls
      atlas-application.properties  __cloudera_metadata__  hbase-env.sh    hdfs-site.xml  log4j.properties  ssl-client.xml
      __cloudera_generation__       core-site.xml         hbase-site.xml  jaas.conf        ozone-site.xml
    3. Copy the hbase-site.xml file.
  3. You must provide read, write, run, and create permissions to the workload user for the Hbase table from the Ranger UI. For more information, see Configure a resource-based policy: HBase.
  1. Create a new project with the files for the Cloudera Data Engineering jobs for HBase.
  2. Add the downloaded JAR files to the lib directory and build the project.
    Example
    $ ls lib/
    hbase-shaded-client-2.4.17.7.1.9.1064-1.jar   opentelemetry-api-0.12.0.jar opentelemetry-context-0.12.0.jar
    
    $ sbt compile && sbt package
    The Cloudera Data Engineering job JAR files are present in the target/scala-[***SCALA-VERSION***] directory.
  3. In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page is displayed.
  4. Click Resources in the left navigation menu. The Resources page is displayed.
  5. Click the Create Resource button.
  6. In the Name filed, enter hbase-resources and click Save.
  7. Go to the Details tab.
  8. Click the Upload Files button and upload the following files:
    • The downloaded JAR files
    • The hbase-site.xml file
    • The Cloudera Data Engineering job JAR files
  9. Create a new job named hbase-table. For instructions about creating jobs, see Creating jobs in Cloudera Data Engineering.
    1. Create a Cloudera Data Engineering job with the following details and attach the hbase-site.xml file in the resources:
      • Name: hbase-table
      • Application File: Upload the hbase-table_[***SCALA-VERSION***].jar file.
      • Main Class: HbaseTable
      • Spark Configurations:

        Example

        spark.driver.extraClassPath=/app/mount/
        spark.executor.extraClassPath=/app/mount/
      • Files and Resources: Upload the hbase-site.xml file and the downloaded JAR files.
    2. Click Create and Run.
  10. In the Job Runs, verify if the hbase-table job passes.
  11. Validate if the HBase table is created.
  12. Open your terminal and SSH into the Gateway node for HBase from the base cluster and run the following command:
    $ hbase shell
    HBase Shell
    Use "help" to get list of supported commands.
    Use "exit" to quit this interactive shell.
    For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
    Version 2.4.17.7.1.9.1059-4, rba2d63e3c506a96a0e0e61dd1e669b0f8240c629, Wed Sep  3 08:25:31 UTC 2025
    Took 0.0026 seconds
    hbase:001:0> list
    TABLE
    ATLAS_ENTITY_AUDIT_EVENTS
    SYSTEM.CATALOG
    SYSTEM.CHILD_LINK
    SYSTEM.FUNCTION
    SYSTEM.LOG
    SYSTEM.MUTEX
    SYSTEM.SEQUENCE
    SYSTEM.STATS
    SYSTEM.TASK
    atlas_janus
    hbase_table
    users
    13 row(s)
    Took 0.6749 seconds
    => ["ATLAS_ENTITY_AUDIT_EVENTS", "SYSTEM.CATALOG", "SYSTEM.CHILD_LINK", "SYSTEM.FUNCTION", "SYSTEM.LOG", "SYSTEM.MUTEX", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "SYSTEM.TASK", "atlas_janus", "hbase_table", "test", "users"]
    hbase:002:0> describe 'hbase_table'
    Table hbase_table is ENABLED
    hbase_table, {TABLE_ATTRIBUTES => {METADATA => {'hbase.store.file-tracker.impl' => 'DEFAULT'}}}
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'column', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL =>
     'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
    
    1 row(s)
    Quota is disabled
    Took 0.2162 seconds
    hbase:003:0> scan 'hbase_table'
    ROW                                         COLUMN+CELL
     row0                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=0
     row1                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=1
     row2                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=2
     row3                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=3
     row4                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=4
     row5                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=5
     row6                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=6
     row7                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=7
     row8                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=8
     row9                                       column=column:value, timestamp=2026-03-24T04:00:49.526, value=9
    10 row(s)
    Took 0.1657 seconds