Fixed Issues in HBase

Review the list of HBase issues that are resolved in Cloudera Runtime 7.3.1.

CDPD-67520: JWT authentication expects [sub] claim in the payload

A JWT payload can have a custom claim for Subject/Principal instead of the standard sub claim.

You can set the hbase.security.oauth.jwt.token.principal.claim configuration property in Cloudera Manager under HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml to define the custom Subject/Principal claim.

CDPD-66387: RegionServer should be aborted when WAL.sync throws TimeoutIOException
This fix adds additional logic for WAL.sync. If WAL.sync gets a timeout exception, HBase wraps TimeoutIOException as a special WALSyncTimeoutIOException. When the upper layer such as HRegion.doMiniBatchMutate called by HRegion.batchMutation catches this special exception, HBase aborts the region server.

Apache Jira: HBASE-27230

CDPD-65373: Make delay prefetch property dynamically configurable
This change allows you to dynamically configure the hbase.hfile.prefetch.delay property using the Cloudera Manager. You need to update the value and refresh the HBase service. The new value is applied to the HBase service automatically.

Apache Jira: HBASE-28292

CDPD-74494: JVM crashes intermittently on ARM64 machines
After noticing the JVM crashes in the HBase service that is based on arm64 architecture and uses JDK 17, the fix is applied that refactors the module and the large implementation function into multiple smaller functions. The issue was observed in a specific module that had a very large member function.

Apache Jira: HBASE-28206

CDPD-73117: Bucket cache utilization is dropped after a rolling restart
For a persistent bucket cache of a size higher than 1.3 TB, the corresponding backing-map information (information related to the persistence cache) grows beyond 2 GB. But, 2 GB is the limit of the protobuf message sizes. These protobuf messages are used to persist the backing map information. If the size of the message grew beyond 2 GB, the backing map partially persisted and after a restart, the size of the cache seemed to be reduced.

With this fix, backing map information was chunked in smaller chunks with sizes below 2 GB. Now all information, even beyond 2 GB, is persisted and can be retrieved back after a rolling restart.

OPSAPS-70946: The hbase-site.xml file does not contain xinclude for the refreshable files
HBase supports generating hbase-site.xml with xinclude which is needed for the hbase-site-refreshable.xml file.
OPSAPS-70908: Refresh cluster command fails during ephemeral cache zero downtime upgrade
Configurations from refreshable files encountered authentication failure during the refresh command when Kerberos is enabled.
hbase/hbase.sh
["refresh-regionserver","hbase.hfile.prefetch.delay","hbase.rs.cacheblocksonwrite",
"hbase.block.data.cacheonread","hbase.rs.evictblocksonclose"]

To fix this, RegionServerRefreshCommand now sets SCM_KERBEROS_PRINCIPAL as the Kerberos principal in the region server refresh process in the environment.

OPSAPS-70866: Invalid HBase prefetch configurations during rolling runtime upgrade
The default values of hbase_hfile_prefetch_delay and hbase.block.data.cacheonread are reverted to 1000 ms and are set to true.
OPSAPS-70294: HBase must use load balancing for the WEBHBASE Knox service
For CDPD 7.3.0 and later, the WEBHBASE service is configured for sticky load balancing instead of high availability in Knox.
OPSAPS-70035: HBase ZooKeeper client TLS toggle should also control the daemon roles
This issue is fixed. HBase ZooKeeper secure client mode now affects all roles.
OPSAPS-69983: Set Zookeeper store types to HBase service configuration
HBase now automatically sets the ZooKeeper truststore type based on ScmParams.
OPSAPS-69805: HBase client configuration does not use a secure port if Client TLS is enabled
HBase only uses a secure ZooKeeper port in client connections if enabled explicitly.
OPSAPS-69757: Make HBase TLS connection to ZooKeeper disabled by default
The HBase TLS connection to ZooKeeper must be disabled because it breaks some use cases. Instead, HBase introduces a new property to enable or disable in client roles. The default value is disabled.
OPSAPS-57937: No alerts are generated when the HBase process is in a hung state
HBase master monitoring (canary) showed green status even if the master has not initialized yet and added extra checks to query HBase if it is up and running.
OPSAPS-53851: ZooKeeper SSL/TLS support for HBase
Cloudera Manager configures HBase for a secure ZooKeeper connection if ZooKeeper TLS is enabled.
CDPD-74725: HBase throws org.apache.hbase.thirdparty.io.netty.util.ResourceLeakDetector exception
HBase direct memory buffer leak issues are fixed which could lead to heap issues in the long run.

Apache Jiras: HBASE-28890 and HBASE-28893

CDPD-72120: Allow specifying a filter for the REST multiget endpoint (addendum: add back SCAN_FILTER constant)
HBase allows specifying a filter for the REST multiget endpoint (addendum: add back SCAN_FILTER constant).

Apache Jira: HBASE-28518

CDPD-71008: REST Java client library assumes stateless servers
This issue is fixed.

Apache Jira: HBASE-28500

CDPD-71007: hbase-rest client shading conflicts with hbase-shaded-client in HBase 2.x
This issue is fixed.

Apache Jira: HBASE-28526

CDPD-71006: Support non-SPNEGO authentication methods and implement session handling in the REST Java client library
This issue is fixed.

Apache Jira: HBASE-28501

CDPD-70493: MultiRowRangeFilter deserialization fails in org.apache.hadoop.hbase.rest.model.ScannerModel
This issue is fixed.

Apache Jira: HBASE-28626

CDPD-69335: Use a single GET call in the REST multiget endpoint
This issue is fixed.

Apache Jira: HBASE-28523

CDPD-68900: HBase properties need to be dynamically configured
The following configurations can be dynamically configured.
  • hbase.rs.evictblocksonclose
  • hbase.rs.cacheblocksonwrite
  • hbase.block.data.cacheonread

After changing values of these confgurations restarting region servers is no longer required. These configurations help in getting better throughput.

Newly changed values in the hbase-site.xml are read by HBase and values in appropriate classes are updated.

CDPD-68550: BucketCache.notifyFileCachingCompleted might incorrectly consider a file fully cached
This issue is fixed.

Apache Jira: HBASE-28458

CDPD-68154: BuckeCache.evictBlocksByHfileName does not work after a cache recovery from a file
This issue is fixed.

Apache Jira: HBASE-28450

CDPD-64046: BucketCache.blocksByHFile might leak on allocationFailure or if encountering input/output errors can lead to cache leak and extra heap usage
This issue is fixed.

Apache Jira: HBASE-28211

CDPD-63765: Move the NavigableSet add operation to the writer thread in BucketCache
This issue fixes potential cache leaks and extra memory usage.

Apache Jira: HBASE-26305

CDPD-62737: PrefetchExecutor must not run for files from the CF levels that have disabled BLOCKCACHE
This fix allows disabling the caching or pre-caching of individual tables.

Apache Jira: HBASE-28217

CDPD-45890: Fix the miss count in one of the CombinedBlockCache getBlock implementations
This fix impacts the hit ratio chart's accuracy in Cloudera Manager.

Apache Jira: HBASE-28189