Fixed Issues in HBase

Review the list of HBase issues that are resolved in Cloudera Runtime 7.3.1, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.1.400 SP2

CDPD-84435: The upgrade operation fails with a message “Failed to decommission RegionServer.”
7.3.1.400
This issue is fixed. Now, the HBase MASTER aborts when it detects a WALSyncTimeoutException while making edits to the MasterRegion.

Apache Jira: HBASE-28803

CDPD-83544 : Potential performance degradation may occur when utilizing persistent cache, causing a restart involving full cache recovery.
7.3.1.400
This issue is fixed.

Apache Jira: HBASE-29326

CDPD-81524: Add configurable throttling of region moves in CacheAwareLoadBalancer
7.3.1.400
This fix introduces region moving throttling for LoadBalancer implementations. The throttling time is configurable by the hbase.master.balancer.move.throttlingMillis property, with a default value of 60000 milliseconds.

In this change, the only balancer implementation applying throttling is the CacheAwareLoadBalancer. All other balancers just inherit the noop default provided within the LoadBalancer interface.

The CacheAwareLoadBalancer throttling implementation performs throttling only for regions moving to the target server with a region cached ratio below the threshold configurable by hbase.master.balancer.stochastic.throttling.cacheRatio (80% by default).

Apache Jira: HBASE-29168

CDPD-81524: The `ENCODED_DATA` block type is not being considered within `BucketCache.notifyFileCachingComplete`
7.3.1.400
This fix addresses a defect in BucketCache.notifyFileCachingComplete, wherein only blocks of the DATA type were registered. When an encoding such as FASTDIFF was employed, the data block type became ENCODED_DATA, preventing it from being accounted for in the internal cache metrics. This oversight subsequently affects the cache-aware balancer after cache recovery following a crash or restart (with persistent cache enabled), as the region percentage cache is not accurately calculated due to this flaw.

Apache Jira: HBASE-29243

CDPD-81524: Enable BlockCache implementations to define dynamic properties
7.3.1.400
This resolution introduces dynamic configurability for the following properties related to free space management and block prioritization:
  • hbase.bucketcache.acceptfactor
  • hbase.bucketcache.minfactor
  • hbase.bucketcache.extrafreefactor
  • hbase.bucketcache.single.factor
  • hbase.bucketcache.multi.factor
  • hbase.bucketcache.multi.factor
  • hbase.bucketcache.memory.factor
  • hbase.bucketcache.queue.addition.waittime
  • hbase.bucketcache.persist.intervalinmillis
  • hbase.bucketcache.persistence.chunksize

Apache Jira: HBASE-29249

CDPD-81524: Display hit ratio metrics by configurable, granular periods
7.3.1.400
This change introduces two additional properties:
  • hbase.blockcache.stats.periods which allows defining a multiple window period;
  • hbase.blockcache.stats.period.minute which defines the length of each of these periods (in minutes);

If hbase.blockcache.stats.periods is defined and is greater than one, it creates a scheduled executor that rolls the metrics calculation at hbase.blockcache.stats.period.minute rate. This property calculates the hit ratio for each of the last periods (as defined by hbase.blockcache.stats.periods), accounting for only the hits and requests that occurred during the interval of the given period (as defined by hbase.blockcache.stats.period.minute).

Apache Jira: HBASE-29276

CDPD-81524: Avoid adding new blocks during prefetch if usage is greater than the accept factor
7.3.1.400
Previously, when cache prefetch was enabled and cache usage reached the configured acceptance factor, it resulted in a cycle of frequent mass block evictions until the prefetch thread completed reading the entire file. This process proved to be both costly and inefficient. An initial attempt to mitigate this issue was proposed in HBASE-28176; however, that solution only interrupted the prefetch thread after it had already attempted to cache the current block being read, which could still trigger a mass eviction.

To completely avert evictions triggered solely by the prefetch, this modification evaluates the impact of incorporating the current block into the cache before attempting to write it into the cache. This verification is exclusively executed when caching from prefetch threads; standard client reads and HFile writes persist in their attempt to cache the associated block.

Apache Jira: HBASE-29288

Cloudera Runtime 7.3.1.300 SP1 CHF 1

There are no fixed issues in this release.

Cloudera Runtime 7.3.1.200 SP1

CDPD-77399: HBase fails to register the servlet metrics and throws ClassNotFoundException: org.apache.hadoop.metrics.MetricsServlet
This issue is fixed now. HBase does not warn about the Hadoop 2-based metric servlet class on a Hadoop 3 deployment.

Apache Jira:: HBASE-28315

Cloudera Runtime 7.3.1.100 CHF 1

There are no fixed issues in this release.

Cloudera Runtime 7.3.1

CDPD-67520: JWT authentication expects [sub] claim in the payload

A JWT payload can have a custom claim for Subject/Principal instead of the standard sub claim.

You can set the hbase.security.oauth.jwt.token.principal.claim configuration property in Cloudera Manager under HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml to define the custom Subject/Principal claim.

CDPD-66387: RegionServer should be aborted when WAL.sync throws TimeoutIOException
This fix adds additional logic for WAL.sync. If WAL.sync gets a timeout exception, HBase wraps TimeoutIOException as a special WALSyncTimeoutIOException. When the upper layer such as HRegion.doMiniBatchMutate called by HRegion.batchMutation catches this special exception, HBase aborts the region server.

Apache Jira: HBASE-27230

CDPD-65373: Make delay prefetch property dynamically configurable
This change allows you to dynamically configure the hbase.hfile.prefetch.delay property using the Cloudera Manager. You need to update the value and refresh the HBase service. The new value is applied to the HBase service automatically.

Apache Jira: HBASE-28292

CDPD-74494: JVM crashes intermittently on ARM64 machines
After noticing the JVM crashes in the HBase service that is based on arm64 architecture and uses JDK 17, the fix is applied that refactors the module and the large implementation function into multiple smaller functions. The issue was observed in a specific module that had a very large member function.

Apache Jira: HBASE-28206

CDPD-73118: Bucket cache validation fails after a rolling restart resulting in an empty bucket cache without running the prefetch operations
During the retrieval of bucket cache from persistence, it was observed that, if an exception, other than the IOException occurs, the exception is not logged, and also the retrieval thread exits leaving the bucket cache in an uninitialized state, leaving it unusable.

This change enables the retrieval thread to print all types of exceptions and also reinitializes the bucket cache and makes it reusable.