Known Issues in Apache Hive

Learn about the known issues in Hive, the impact or changes to the functionality, and the workaround.

OPSAPS-54299 Installing Hive on Tez and HMS in the incorrect order causes HiveServer failure: You need to install Hive on Tez and HMS in the correct order; otherwise, HiveServer fails. You need to install additional HiveServer roles to Hive on Tez, not the Hive service; otherwise, HiveServer fails.; Workaround: Follow instructions on Installing Hive on Tez.

CDPD-15518: ACID tables you write using the Hive Warehouse Connector cannot be read from an Impala virtual warehouse.: Read the tables from a Hive virtual warehouse or using Impala queries in Data Hub.
CDPD-13636: Hive job fails with OutOfMemory exception in the Azure DE cluster: Set the parameter hive.optimize.sort.dynamic.partition.threshold=0. Add this parameter in Cloudera Manager (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)

ENGESC-2214: Hiveserver2 and HMS service logs are not deleted: Update Hive log4j configurations. Hive -> Configuration -> HiveServer2 Logging Advanced Configuration Snippet (Safety Valve) Hive Metastore -> Configuration -> Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve) Add the following to the configurations: appender.DRFA.strategy.action.type=DELETE appender.DRFA.strategy.action.basepath=${log.dir} appender.DRFA.strategy.action.maxdepth=1 appender.DRFA.strategy.action.PathConditions.glob=${log.file}.* appender.DRFA.strategy.action.PathConditions.type=IfFileName appender.DRFA.strategy.action.PathConditions.nestedConditions.type=IfAccumulatedFileCount appender.DRFA.strategy.action.PathConditions.nestedConditions.exceeds=same value as appender.DRFA.strategy.max

HiveServer Web UI displays incorrect data: If you enabled auto-TLS for TLS encryption, the HiveServer2 Web UI does not display the correct data in the following tables: Active Sessions, Open Queries, Last Max n Closed Queries

CDPD-11890: Hive on Tez cannot run certain queries on tables stored in encryption zones: This problem occurs when the Hadoop Key Management Server (KMS) connection is SSL-encrypted and a self signed certificate is used. SSLHandshakeException might appear in Hive logs.; Use one of the workarounds:

Install a self signed SSL certificate into cacerts file on all hosts.

Copy ssl-client.xml to a directory that is available in all hosts. In Cloudera Manager, in Clusters > Hive on Tez > Configuration. In Hive Service Advanced Configuration Snippet for hive-site.xml, click +, and add the name tez.aux.uris and valuepath-to-ssl-client.xml.

Technical Service Bulletins🔗

TSB 2021-459: Renaming managed (ACID) table shows empty records: Renaming an ACID (managed) table using ALTER TABLE <table name> RENAME causes empty records in the table. Also, the location of the new table after renaming points to the location of the old table before renaming. This can cause correctness issues, for example:
create table abc (id int); insert into abc values (1); rename table abc to def; create table abc (id int); // should be empty insert into abc values (2); select * from abc ; // returns 1 and 2, the new and the old results
Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2021-459: Renaming managed (ACID) table shows empty records

TSB 2021-480/1: Hive produces incorrect query results when skipping a header in a binary file: In CDP, setting the table property skip.header.line.count to greater than 0 in a table stored in a binary format, such as Parquet, can cause incorrect query results. The skip header property is intended for use with Text files and typically used with CSV files. The issue is not present when you run the query on a Text file that sets the skip header property to 1 or greater.
Upstream JIRA: Apache Jira: HIVE-24827
Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2021-480.1: Hive produces incorrect query results when skipping a header in a binary file

TSB 2021-480/2: Hive ignores the property to skip a header or footer in a compressed file: In CDP, setting the table properties skip.header.line.count and skip.footer.line.count to greater than 0 in a table stored in a compressed format, such as bzip2, can cause incorrect results from SELECT * or SELECT COUNT ( * ) queries.
Upstream JIRA: Apache Jira: HIVE-24224
Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2021-480.2: Hive ignores the property to skip a header or footer in a compressed file

TSB 2021-482: Race condition in subdirectory delete/rename causes hive jobs to fail: Multiple threads try to perform a rename operation on s3. One of the threads fails to perform a rename operation, causing an error. Hive logs will report "HiveException: Error moving ..." and the log will contain an error line starting with " Exception when loading partition " -all paths listed with s3a:// prefixes.
Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2021-482: Race condition in subdirectory delete/rename causes Hive jobs to fail

TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive

JOIN queries return wrong results when performing joins on large size keys (larger than 255 bytes). This happens when the fast hash table join algorithm is enabled, which is enabled by default.

Impact

Incorrect results

Action required

Hotfix request

Request a hotfix from Cloudera Support.
Workaround

Set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled to false. This might cause a performance degradation depending on the type of query and the system it is running on.

Knowledge article

For the latest update on this issue see the corresponding Knowledge article: TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive

TSB 2021-518: Incorrect results returned when joining two tables with different bucketing versions: Incorrect results are returned when joining two tables with different bucketing versions, and with the following Hive configurations: set hive.auto.convert.join = false and set mapreduce.job.reduces = any custom value.
Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2021-518: Incorrect results returned when joining two tables with different bucketing versions

TSB 2021-524: Intermittent data duplication if direct insert enabled

If direct insert is enabled, data is written directly to the final location with an attemptId. At the end of the insert operation, all data written before the final attempt should be deleted. However due to a bug in HIVE-21164, this does not happen.

Example: Data is written to the final location with attemptId=0, but this task fails. Hive tries the task again and writes data to the final location with attemptId=1. At the end of the insert, Hive should remove all the files with attemptId=0, but it does not.

Upstream JIRA

Knowledge article

For the latest update on this issue see the corresponding Knowledge article: TSB 2021-524: Intermittent data duplication if direct insert enabled

TSB 2023-627: IN/OR predicate on binary column returns wrong result: An IN or an OR predicate involving a binary datatype column may produce wrong results. The OR predicate is converted to an IN due to the setting hive.optimize.point.lookup which is true by default. Only binary data types are affected by this issue. See https://issues.apache.org/jira/browse/HIVE-26235 for example queries which may be affected.
Upstream JIRA: HIVE-26235
Knowledge article: For the latest update on this issue, see the corresponding Knowledge article: TSB 2023-627: IN/OR predicate on binary column returns wrong result

Known Issues in Apache Hive

Technical Service Bulletins🔗

We want your opinion

How can we improve this page?