Known Issues in Apache Hive

Learn about the known issues in Hive, the impact or changes to the functionality, and the workaround.

CDPD-23110: HS2 / HMS service becomes unresponsive as the LeaseRenewer thread waits to get the Kerberos ticket via System.in.
An example on the workaround for this issue is available in this KB article.
OPSAPS-58664: Hive on Tez LDAP configurations are not pushed to hive-site.xml by Cloudera Manager
After setting up LDAP properties in the Hive on Tez service, the settings are not pushed into hive-site.xml for Hive on Tez service even after a restart. The issue is due to HiveOnTezServiceHandler re-using definitions from HiveConfigFileDefinitions. The definitions are not including any roletypes other than HiveServiceHandler's roletypes.
OPSAPS-59928: INSERT INTO from SELECT using hive (hbase) table returns an error under certain conditions.
Users who upgraded to a Kerberized CDP cluster from HDP and enabled AutoTLS have reported this problem. For more information, see Cloudera Community article: ERROR: "FAILED: Execution Error, return code 2" when the user is unable to issue INSERT INTO from SELECT using hive (hbase) table.
In Cloudera Manager > TEZ > Configurations, find the tez.cluster.additional.classpath.prefix Safety Valve, and set the value to /etc/hbase/conf.
CDPD-21365: Performing a drop catalog operation drops the catalog from the CTLGS table. The DBS table has a foreign key reference on CTLGS for CTLG_NAME. Because of this, the DBS table is locked and creates a deadlock.
You must create an index in the DBS table on CTLG_NAME: CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME);.
OPSAPS-60546: Upgrading from CDH to Cloudera Runtime 7, the Hive Java Heap Size does not propagate and defaults to 2GB.
Manually reconfigure Hive Java Heap Size after upgrade.
OPSAPS-54299 Installing Hive on Tez and HMS in the incorrect order causes HiveServer failure
You need to install Hive on Tez and HMS in the correct order; otherwise, HiveServer fails. You need to install additional HiveServer roles to Hive on Tez, not the Hive service; otherwise, HiveServer fails.
Workaround: Follow instructions on Installing Hive on Tez.
CDPD-23041: DROP TABLE on a table having an index does not work
If you migrate a Hive table to CDP having an index, DROP TABLE does not drop the table. Hive no longer supports indexes (HIVE-18448). A foreign key constraint on the indexed table prevents dropping the table. Attempting to drop such a table results in the following error:
java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails ("hive"."IDXS", CONSTRAINT "IDXS_FK1" FOREIGN KEY ("ORIG_TBL_ID") REFERENCES "TBLS ("TBL_ID"))
There are two workarounds:
  • Drop the foreign key "IDXS_FK1" on the "IDXS" table within the metastore. You can also manually drop indexes, but do not cascade any drops because the IDXS table includes references to "TBLS".
  • Launch an older version of Hive, such as Hive 2.3 that includes IDXS in the DDL, and then drop the indexes as described in Language Manual Indexing.
Apache Issue: Hive-24815
CDPD-20636 and DWX-6163: SHOW TABLES command does not produce a list of tables that are owned by the current user
When you run the SHOW TABLES command against a Hive Virtual Warehouse, tables are only returned if you have explicit read or read/write access to the table, or if you belong to a group that has read or read/write access. If you only have access to the tables because you are the owner of the objects, you can query the table content, but the table names do not appear in the SHOW TABLES command output.
Add the owner of the database or the tables as a user with read or read/write access to the tables directly.
CDPD-17766: Queries fail when using spark.sql.hive.hiveserver2.jdbc.url.principal in the JDBC URL to invoke Hive.
Do not specify spark.sql.hive.hiveserver2.jdbc.url.principal in the JDBC URL to invoke Hive remotely.
Workaround: specify principal=hive.server2.authentication.kerberos.principal as shown in the following syntax:
jdbc:hive://<host>:<port>/<dbName>;principal=hive.server2.authentication.kerberos.principal;<otherSessionConfs>?<hiveConfs>#<hiveVars>
HIVE-24271: Problem creating an ACID table in legacy table mode
In site-level, legacy CREATE TABLE mode, the CREATE MANAGED TABLE command might not work as expected to override the legacy behavior and create a managed ACID table. The command works only at the session level.
Workaround: Include table properties in a CREATE TABLE that specify a transactional table. For example:
CREATE TABLE T2(a int, b int)
 STORED AS ORC
 TBLPROPERTIES ('transactional'='true');       
CDPD-10352:Hive on Tez cannot run certain queries on tables stored in encryption zones. This occurs when KMS connection is SSL encrypted and a self-signed certificate is used. You may see SSLHandshakeException in Hive logs in this case.
There are two workarounds: 1. You can install a self-signed SSL certificate into cacerts file in all hosts. 2. You can copy ssl-client.xml to a directory that is available in all hosts. Then you must set the tez.aux.uris=path-to-ssl-client.xml property in Hive on Tez advanced configuration.
CDPD-13636: Hive job fails with OutOfMemory exception in the Azure DE cluster
Set the parameter hive.optimize.sort.dynamic.partition.threshold=0. Add this parameter in Cloudera Manager (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)
CDPD-16802: Autotranslate assertion failure.
The exception is not triggered when it is executed from Spark-Shell. This is from Hive in the getJdoFilterPushdownParam parameter of ExpressionTree.java, which checks the partition column as only String and not any other type.
This can be disabled by setting hive.metastore.integral.jdo.pushdown to true.
ENGESC-2214: Hiveserver2 and HMS service logs are not deleted
Update Hive log4j configurations. Hive -> Configuration -> HiveServer2 Logging Advanced Configuration Snippet (Safety Valve) Hive Metastore -> Configuration -> Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve) Add the following to the configurations: appender.DRFA.strategy.action.type=DELETE appender.DRFA.strategy.action.basepath=${log.dir} appender.DRFA.strategy.action.maxdepth=1 appender.DRFA.strategy.action.PathConditions.glob=${log.file}.* appender.DRFA.strategy.action.PathConditions.type=IfFileName appender.DRFA.strategy.action.PathConditions.nestedConditions.type=IfAccumulatedFileCount appender.DRFA.strategy.action.PathConditions.nestedConditions.exceeds=same value as appender.DRFA.strategy.max
HiveServer Web UI displays incorrect data
If you enabled auto-TLS for TLS encryption, the HiveServer2 Web UI does not display the correct data in the following tables: Active Sessions, Open Queries, Last Max n Closed Queries
CDPD-11890: Hive on Tez cannot run certain queries on tables stored in encryption zones
This problem occurs when the Hadoop Key Management Server (KMS) connection is SSL-encrypted and a self signed certificate is used. SSLHandshakeException might appear in Hive logs.
Use one of the workarounds:
  • Install a self signed SSL certificate into cacerts file on all hosts.
  • Copy ssl-client.xml to a directory that is available in all hosts. In Cloudera Manager, in Clusters > Hive on Tez > Configuration. In Hive Service Advanced Configuration Snippet for hive-site.xml, click +, and add the name tez.aux.uris and valuepath-to-ssl-client.xml.

Technical Service Bulletins

TSB 2021-459: Renaming managed (ACID) table shows empty records
Renaming an ACID (managed) table using ALTER TABLE <table name> RENAME causes empty records in the table. Also, the location of the new table after renaming points to the location of the old table before renaming. This can cause correctness issues, for example:
create table abc (id int);
insert into abc values (1);
rename table abc to def; create table abc (id int); // should be empty
insert into abc values (2);
select * from abc ; // returns 1 and 2, the new and the old results
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-459: Renaming managed (ACID) table shows empty records
TSB 2021-480/1: Hive produces incorrect query results when skipping a header in a binary file
In CDP, setting the table property skip.header.line.count to greater than 0 in a table stored in a binary format, such as Parquet, can cause incorrect query results. The skip header property is intended for use with Text files and typically used with CSV files. The issue is not present when you run the query on a Text file that sets the skip header property to 1 or greater.
Upstream JIRA
Apache Jira: HIVE-24827
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-480.1: Hive produces incorrect query results when skipping a header in a binary file
TSB 2021-480/2: Hive ignores the property to skip a header or footer in a compressed file
In CDP, setting the table properties skip.header.line.count and skip.footer.line.count to greater than 0 in a table stored in a compressed format, such as bzip2, can cause incorrect results from SELECT * or SELECT COUNT ( * ) queries.
Upstream JIRA
Apache Jira: HIVE-24224
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-480.2: Hive ignores the property to skip a header or footer in a compressed file
TSB 2021-482: Race condition in subdirectory delete/rename causes hive jobs to fail
Multiple threads try to perform a rename operation on s3. One of the threads fails to perform a rename operation, causing an error. Hive logs will report "HiveException: Error moving ..." and the log will contain an error line starting with " Exception when loading partition " -all paths listed with s3a:// prefixes.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-482: Race condition in subdirectory delete/rename causes Hive jobs to fail
TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive
JOIN queries return wrong results when performing joins on large size keys (larger than 255 bytes). This happens when the fast hash table join algorithm is enabled, which is enabled by default.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive
TSB 2021-518: Incorrect results returned when joining two tables with different bucketing versions
Incorrect results are returned when joining two tables with different bucketing versions, and with the following Hive configurations: set hive.auto.convert.join = false and set mapreduce.job.reduces = any custom value.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-518: Incorrect results returned when joining two tables with different bucketing versions
TSB 2021-524: Intermittent data duplication if direct insert enabled
If direct insert is enabled, data is written directly to the final location with an attemptId. At the end of the insert operation, all data written before the final attempt should be deleted. However due to a bug in HIVE-21164, this does not happen.

Example: Data is written to the final location with attemptId=0, but this task fails. Hive tries the task again and writes data to the final location with attemptId=1. At the end of the insert, Hive should remove all the files with attemptId=0, but it does not.

Upstream JIRA
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-524: Intermittent data duplication if direct insert enabled
TSB 2021-529: Ranger RMS leads to HMS Connection leak and increased heap memory usage in NameNode process
After enabling Ranger Resource Mapping Service (RMS), RMS connects to Hive MetaStore (HMS) every 30 seconds to fetch the notification event. However, for each request, RMS creates two HMS connections and only closes one of them.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2021-529: Ranger RMS leads to HMS Connection leak and increased heap memory usage in NameNode process
TSB 2022-526: A Hive query may produce wrong results for some vectorized built-in functions with compound expression in PARTITION BY or ORDER BY clause
Vectorized functions with PARTITION BY and/or ORDER BY clauses where the partition or order by expression is compound (example: cast string to integer) and not just a simple column reference may be broken.
The query may fail or output wrong results, depending on the compound expression. For example:
  • Cast integer to string results in query failure with a NullPointerExpression
  • Cast string to integer outputs wrong results
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2022-526: A Hive query may produce wrong results for some vectorized built-in functions with compound expression in PARTITION BY or ORDER BY clause
TSB 2023-627: IN/OR predicate on binary column returns wrong result
An IN or an OR predicate involving a binary datatype column may produce wrong results. The OR predicate is converted to an IN due to the setting hive.optimize.point.lookup which is true by default. Only binary data types are affected by this issue. See https://issues.apache.org/jira/browse/HIVE-26235 for example queries which may be affected.
Upstream JIRA
HIVE-26235
Knowledge article
For the latest update on this issue, see the corresponding Knowledge article: TSB 2023-627: IN/OR predicate on binary column returns wrong result