Known Issues in Apache Hive

Learn about the known issues in Hive, the impact or changes to the functionality, and the workaround.

CDPD-14361: Hive compaction on CDP Azure environment does not work in 7.2.10.
None.
CDPD-15518: ACID tables you write using the Hive Warehouse Connector cannot be read from an Impala virtual warehouse.
Read the tables from a Hive virtual warehouse or using Impala queries in Data Hub.
CDPD-13636: Hive job fails with OutOfMemory exception in the Azure DE cluster
Set the parameter hive.optimize.sort.dynamic.partition.threshold=0. Add this parameter in Cloudera Manager (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)
ENGESC-2214: Hiveserver2 and HMS service logs are not deleted
Update Hive log4j configurations. Hive -> Configuration -> HiveServer2 Logging Advanced Configuration Snippet (Safety Valve) Hive Metastore -> Configuration -> Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve) Add the following to the configurations: appender.DRFA.strategy.action.type=DELETE appender.DRFA.strategy.action.basepath=${log.dir} appender.DRFA.strategy.action.maxdepth=1 appender.DRFA.strategy.action.PathConditions.glob=${log.file}.* appender.DRFA.strategy.action.PathConditions.type=IfFileName appender.DRFA.strategy.action.PathConditions.nestedConditions.type=IfAccumulatedFileCount appender.DRFA.strategy.action.PathConditions.nestedConditions.exceeds=same value as appender.DRFA.strategy.max
HiveServer Web UI displays incorrect data
If you enabled auto-TLS for TLS encryption, the HiveServer2 Web UI does not display the correct data in the following tables: Active Sessions, Open Queries, Last Max n Closed Queries
CDPD-11890: Hive on Tez cannot run certain queries on tables stored in encryption zones
This problem occurs when the Hadoop Key Management Server (KMS) connection is SSL-encrypted and a self signed certificate is used. SSLHandshakeException might appear in Hive logs.
Use one of the workarounds:
  • Install a self signed SSL certificate into cacerts file on all hosts.
  • Copy ssl-client.xml to a directory that is available in all hosts. In Cloudera Manager, in Clusters > Hive on Tez > Configuration. In Hive Service Advanced Configuration Snippet for hive-site.xml, click +, and add the name tez.aux.uris and valuepath-to-ssl-client.xml.

Technical Service Bulletins

TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive
JOIN queries return wrong results when performing joins on large size keys (larger than 255 bytes). This happens when the fast hash table join algorithm is enabled, which is enabled by default.
Impact
Incorrect results
Action required
  • Hotfix request
    Request a hotfix from Cloudera Support.
  • Workaround
    Set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled to false. This might cause a performance degradation depending on the type of query and the system it is running on.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive
TSB 2021-518: Incorrect results returned when joining two tables with different bucketing versions
Incorrect results are returned when joining two tables with different bucketing versions, and with the following Hive configurations: set hive.auto.convert.join = false and set mapreduce.job.reduces = any custom value.
Impact
Incorrect results
Action required
  • Modify the bucketing version of table with version 1 to version 2, using the alter table command. For example: alter table test set tblproperties ('bucketing_version'='2')
  • Also, reload the data into the table, so you can reinsert them using the new bucketing function.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-518: Incorrect results returned when joining two tables with different bucketing versions
TSB 2021-520: Cleaner causes data loss when processing an aborted dynamic partitioning transaction
Data loss may occur when an operation that involves dynamic partitioning is aborted in Hive. Cleaner does not know what partition contains the aborted deltas, so it goes over all partitions and removes aborted and `obsolete` deltas below the HighWatermark (highest writeid that could be cleaned up). Those `obsolete` deltas may be `active` ones. There is no easy way to identify obsolete deltas that are active because HighWatermark is defined on a table level.
Upstream JIRA
HIVE-25502
Impact
Continuous Data Loss on the affected table(s)
Workaround
Turning off cleaner and setting the following configuration on all HMS instances (for example,. DataLake, CDW, Datahub) where initiator is turned on:
hive.compactor.cleaner.retention.time.seconds=3650d
hive.compactor.delayed.cleanup.enabled=true
Note that this will cause no cleaning to be executed for recent transactions and consume more storage space until the fix is applied.
Hotfix request
Request a hotfix from Cloudera Support.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-520: Cleaner causes data loss when processing an aborted dynamic partitioning transaction
TSB 2021-532: HWC fails to write empty DataFrame to orc files
HWC writes fail when an empty DataFrame write is attempted. That is because the writer does not create an orc file if no records are present in the DataFrame. This causes the HWC write commit validation to fail.
Impact
Users will not be able to create Hive tables with zero rows using HWC
Action required
Upgrade to CDP Public Cloud 7.2.12 or higher
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-532: HWC fails to write empty DataFrame to orc files