Known Issues in Apache Hive

Learn about the known issues in Hive, the impact or changes to the functionality, and the workaround.

CDPD-15518: ACID tables you write using the Hive Warehouse Connector cannot be read from an Impala virtual warehouse.
Read the tables from a Hive virtual warehouse or using Impala queries in Data Hub.
CDPD-13636: Hive job fails with OutOfMemory exception in the Azure DE cluster
Set the parameter hive.optimize.sort.dynamic.partition.threshold=0. Add this parameter in Cloudera Manager (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)
ENGESC-2214: Hiveserver2 and HMS service logs are not deleted
Update Hive log4j configurations. Hive -> Configuration -> HiveServer2 Logging Advanced Configuration Snippet (Safety Valve) Hive Metastore -> Configuration -> Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve) Add the following to the configurations: appender.DRFA.strategy.action.type=DELETE appender.DRFA.strategy.action.basepath=${log.dir} appender.DRFA.strategy.action.maxdepth=1 appender.DRFA.strategy.action.PathConditions.glob=${log.file}.* appender.DRFA.strategy.action.PathConditions.type=IfFileName appender.DRFA.strategy.action.PathConditions.nestedConditions.type=IfAccumulatedFileCount appender.DRFA.strategy.action.PathConditions.nestedConditions.exceeds=same value as appender.DRFA.strategy.max
HiveServer Web UI displays incorrect data
If you enabled auto-TLS for TLS encryption, the HiveServer2 Web UI does not display the correct data in the following tables: Active Sessions, Open Queries, Last Max n Closed Queries
CDPD-11890: Hive on Tez cannot run certain queries on tables stored in encryption zones
This problem occurs when the Hadoop Key Management Server (KMS) connection is SSL-encrypted and a self signed certificate is used. SSLHandshakeException might appear in Hive logs.
Use one of the workarounds:
  • Install a self signed SSL certificate into cacerts file on all hosts.
  • Copy ssl-client.xml to a directory that is available in all hosts. In Cloudera Manager, in Clusters > Hive on Tez > Configuration. In Hive Service Advanced Configuration Snippet for hive-site.xml, click +, and add the name tez.aux.uris and valuepath-to-ssl-client.xml.

Technical Service Bulletins

TSB 2022-567: Potential Data Loss due to CTLT HBaseStorageHandler failure dropping underlying HBase table while rollback
If the create table target_table like source table command (CTLT) fails and the source table is HBaseStorageHandler-based table, the HBaseMetaHook rollback logic deletes the underlying HBase table, resulting in potential data loss.
Upstream JIRA
HIVE-25989
Knowledge article
For the latest update on this issue, see the corresponding Knowledge article: TSB 2022-567: Potential Data Loss due to CTLT HBaseStorageHandler failure dropping underlying HBase table while rollback
TSB 2022-600: Renaming translated external partition table shows empty records in Apache Hive
If an Apache Hive partitioned table is renamed, it can cause data loss due to the location being incorrectly translated at the Hive Metastore (HMS) translation layer in the legacy config mode.

Scenario:

  • The following configurations are set:
    • hive.create.as.external.legacy=true
    • hive.created.as.acid=true
  • The following processes are executed:
    • Creation of new partition table
    • Data is loaded on new table
    • Table is renamed
    • Scan/view after rename of the same table returns empty records

Example:

  • The following kind of query is affected:
    CREATE TABLE foo (i1 int) PARTITIONED BY (i2 string);
                      INSERT INTO foo VALUES (1,’foo’);
                      ALTER TABLE foo RENAME TO foo_renamed;
                      SELECT  * FROM foo_renamed; //returns empty records
                    
  • The following kind of query is not affected:
    CREATE EXTERNAL foo (i1 int) PARTITIONED BY (i2 string);
                      INSERT INTO foo VALUES (1,’foo’);
                      ALTER TABLE foo RENAME TO foo_renamed;
                      SELECT  * FROM foo_renamed; //returns 1 record
                    
Upstream JIRA
HIVE-26158
Knowledge article
For the latest update on this issue, see the corresponding Knowledge article: TSB 2022-600: Renaming translated external partition table shows empty records in Apache Hive
TSB 2023-627: IN/OR predicate on binary column returns wrong result
An IN or an OR predicate involving a binary datatype column may produce wrong results. The OR predicate is converted to an IN due to the setting hive.optimize.point.lookup which is true by default. Only binary data types are affected by this issue. See https://issues.apache.org/jira/browse/HIVE-26235 for example queries which may be affected.
Upstream JIRA
HIVE-26235
Knowledge article
For the latest update on this issue, see the corresponding Knowledge article: TSB 2023-627: IN/OR predicate on binary column returns wrong result
TSB 2023-653: Cleaner causes data loss when processing an aborted dynamic partitioning transaction
If the compaction-cleaner is enabled, data loss may occur when an operation that involves dynamic partitioning is aborted in Hive. Cleaner does not know what partition contains the aborted deltas, so it goes over all partitions and removes aborted and `obsolete` deltas below the HighWatermark (highest writeid that could be cleaned up). Those `obsolete` deltas may be `active` ones. There is no easy way to identify obsolete deltas that are active because HighWatermark is defined on a table level.
Upstream JIRA
HIVE-25502
Knowledge article
For the latest update on this issue, see the corresponding Knowledge article: TSB 2023-653: Cleaner causes data loss when processing an aborted dynamic partitioning transaction