Known Issues in Apache HBase
This topic describes known issues and workarounds for using HBase in this release of Cloudera Runtime.
IntegrationTestReplication
fails if replication does not finish before theverify
phase begins-
During
IntegrationTestReplication
, if theverify
phase starts before thereplication
phase finishes, the test will fail because the target cluster does not contain all of the data. If the HBase services in the target cluster does not have enough memory, long garbage-collection pauses might occur. - HDFS encryption with HBase
-
Cloudera has tested the performance impact of using HDFS encryption with HBase. The overall overhead of HDFS encryption on HBase performance is in the range of 3 to 4% for both read and update workloads. Scan performance has not been thoroughly tested.
- AccessController postOperation problems in asynchronous operations
-
When security and Access Control are enabled, the following problems occur:
- If a
Delete Table
fails for a reason other than missing permissions, the access rights are removed but the table may still exist and may be used again. - If
hbaseAdmin.modifyTable()
is used to delete column families, the rights are not removed from the Access Control List (ACL) table. TheportOperation
is implemented only forpostDeleteColumn()
. - If
Create Table
fails, full rights for that table persist for the user who attempted to create it. If another user later succeeds in creating the table, the user who made the failed attempt still has the full rights.
- If a
- Bulk loading into HBase configured to use cloud storage (S3) is currently not supported
- When HBase is configured to use cloud storage (S3), bulk loading data into HBase fails
with the following error:
401 Authentication required
.
- Storing Medium Objects (MOBs) in HBase is currently not supported
- Storing MOBs in HBase relies on bulk loading files, and this is not currently supported when HBase is configured to use cloud storage (S3).
Technical Service Bulletins
- TSB 2021-453: Snapshot and cloned table corruption when original table is deleted
- HBASE-25206 can cause data loss either through corrupting an existing hbase snapshot or destroying data that backs a clone of a previous snapshot.
- Upstream JIRA
- HBASE-25206
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2021-453: HBASE-25206 "snapshot and cloned table corruption when original table is deleted" .
- TSB 2021-463: Snapshot and cloned table corruption when original table is deleted
-
The HDFS short-circuit setting dfs.client.read.shortcircuit is overwritten to disabled by hbase-default.xml. HDFS short-circuit reads bypass access to data in HDFS by using a domain socket (file) instead of a network socket. This alleviates the overhead of TCP to read data from HDFS which can have a meaningful improvement on HBase performance (as high as 30-40%).
Users can restore short-circuit reads by explicitly setting dfs.client.read.shortcircuit in HBase configuration via the configuration management tool for their product (e.g. Cloudera Manager or Ambari).
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2021-463: HBase Performance Issue .
- TSB 2021-494: Accumulated WAL Files Cannot be Cleaned up When Using Phoenix Secondary Global Indexes
- The Write-ahead-log (WAL) files for Phoenix tables that have secondary global indexes defined on them, cannot be automatically cleaned up by HBase, leading to excess storage usage and possible error due to filling up the storage. Accumulated WAL files can lead to lengthy restart times as they must all be played back to ensure no dataloss occurs on restart. This can have follow-on HDFS impact if the number of WAL files overwhelm HDFS Name Node.
- Upstream JIRA
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2021-494: Accumulated WAL Files Cannot be Cleaned up When Using Phoenix Secondary Global Indexes
- TSB 2022-569: HBase normalizer can cause table inconsistencies by merging non-adjacent regions
- The normalizer in HBase is a background job responsible for splitting or merging HBase regions to optimize the number of regions and the distribution of the size of the regions in HBase tables. Due to the bug described in HBASE-24376, the normalizer can cause region inconsistencies (region overlaps/holes) by merging non-adjacent regions.
- Upstream JIRA
- HBASE-24376
- Knowledge article
- For the latest update on this issue, see the corresponding Knowledge article: TSB 2022-569: HBase normalizer can cause table inconsistencies by merging non-adjacent regions