Using Azure Data Lake Store with HBase

CDH 5.12 and higher support using Azure Data Lake Store (ADLS) as a storage layer for HBase.

There are two scenarios in which ADLS can be used with HBase:

  • ADLS-only: In this scenario, both HFiles, which contain user data, and write-ahead logs (WALs) are written to ADLS. This configuration is not recommended for use cases that demand high performance.
  • ADLS + HDFS: In this scenario, HFiles are written to ADLS, but WALs are written to HDFS. This configuration provides higher throughput and lower latency for writes than does the ADLS-only configuration.

Configuring HBase to Use ADLS as a Storage Layer

  1. Set up credentials to enable communication between HBase and ADLS. See Configuring ADLS Connectivity and use one of the configuration methods listed there that HBase supports.
  2. In the Cloudera Manager Admin Console, select the HBase service, click the Configuration tab, and locate the Hbase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
  3. Depending on which scenario you plan to use, add the following values for the Name and Value fields:

    • ADLS-only:

      • Name: hbase.rootdir

        Value: adl://<adls_account_name><hbase_directory>

    • ADLS + HDFS:

      • Name: hbase.rootdir

        Value: adl://<adls_account_name><hbase_directory>

      • Name: hbase.wal.dir

        Value: hdfs://<name_node>:8020/<hbase_wal_directory>

  4. Still on the Configuration page for the HBase service, locate the HBase Service Advanced Configuration Snippet (Safety Valve) for core-site.xml and add the following Name and Value pairs for both configuration scenarios (ADLS-only and ADLS + HDFS):

    • Name: fs.defaultFS

      Value: adl://<adls_account_name>

    • Name: adl.debug.override.localuserasfileowner

      Value: true