Fixed Issues in Hive
Review the list of Hive issues that are resolved in Cloudera Runtime 7.3.1.
- CDPD-13406: Disable TopN in ReduceSinkOp when TopNKey is introduced
-
When both the
ReduceSink
andTopNKey
operators are used together in a query, they both perform Top-N key filtering. This results in the same filtering logic being applied twice, causing slower query execution in query execution. - CDPD-28339: Skip extra work in Cleaner when queue is empty
- The Cleaner previously made unnecessary database calls and logged activities even when there were no candidates for cleaning.
- CDPD-28174: Compaction task reattempt fails due to FileAlreadyExistsException
- The issue arises when a compaction task is relaunched after the
first attempt fails, leaving behind temporary directories. The second attempt encounters a
FileAlreadyExistsException
because the _tmp directory created during the first attempt was not cleared. - CDPD-45285: Incorrect results for IN UDF on Parquet columns of CHAR/VARCHAR type
- Queries with case statements and multiple conditions return
incorrect results for tables in Parquet format, particularly with
CHAR/VARCHAR
types. The issue is not observed with ORC or TextFile formats and can be bypassed by setting hive.optimize.point.lookup tofalse
. - CDPD-24412: Compaction queue entries stuck in 'ready for cleaning' state
- When multiple compaction tasks run simultaneously on the same
table, only one task removes obsolete files while others remain in the
ready for cleaning
state, leading to an accumulation of queue entries. - CDPD-27291: getCrossReference fails when retrieving constraints from the primary key side
- When retrieving constraints from the primary key side, the foreign
key is passed as null, causing the operation to fail with a
Db name cannot be null
exception, especially when the metadata cache is enabled by default. - CDPD-15269: Add caching support for frequently called constraint APIs in catalogd's HMS interface
- The
get_unique_constraints
,get_primary_keys
,get_foreign_keys
, andget_not_null_constraints
APIs are called frequently during query compilation, particularly with TPCDS queries. Without caching, this leads to performance overhead. - DWX-8663: ShuffleScheduler should report the original exception when shuffle becomes unhealthy
- The
ShuffleScheduler
does not report the original exception when the shuffle becomes unhealthy, making it harder to diagnose the underlying issue. - CDPD-43837: MSSQL upgrade scripts fail when adding TYPE column to DBS table
- Schema upgrade for MSSQL fails with an error when trying to add
the
TYPE
column to theDBS
table due to the incorrect usage of the keywordNATIVE
in the default value. - CDPD-43890: Drop data connector if not exists should not throw an exception
- The
DROP DATA CONNECTOR IF NOT EXISTS
command incorrectly throws aNoSuchObjectException
when the connector does not exist. - CDPD-43838: Filter out results for show connectors in Hive Metastore client side
- The
SHOW CONNECTORS
command does not filter results based on authorization, such as Ranger policies, on the client side. - CDPD-43952: HMS get_all_tables method does not retrieve tables from remote database
- The
get_all_tables method
in Hive Metastore handler only retrieves tables from the native database, unlike theget_tables method
, which can retrieve tables from both native and remote databases. - CDPD-55914: Select query on table with remote database returns NULL values with postgreSQL and Redshift data connectors
- Few datatypes are not mapped from Postgres or Redshift to Hive data types in the connector, which resulted in displaying null values for the columns of those data types.
- CDPD-31726: Prevent NullPointerException by Checking Collations Return Value
- A
NullPointerException
occurs during execution of an EXPLAIN cbo on a subquery when using Tez as the execution engine, leading to empty explain output. - CDPD-27418: Incorrect row order after query-based MINOR compaction
- The query-based
MINOR
compaction used an incorrect sorting order, which led to duplicated rows after multiple merge statements. - CDPD-27419: Incorrect row order validation for query-based major compaction
- The row order validation for query-based
MAJOR
compaction incorrectly checked the order as bucketProperty, leading to failures with multiple bucketProperties. - Enable proper handling of non-default schemas in Hive for JDBC databases
- Hive fails to create an external table for a JDBC database when
the table is in a non-default schema, causing
PSQLException
error that the table does not exist. - CBO failure when using JDBC table with password through dbcp.password.uri
- When a table is created using
JDBCStorageHandler
and theJDBC_PASSWORD_URI
is specified for the password, the Cost-Based Optimizer (CBO) fails. This causes all the data to be fetched directly from the database and processed in Hive, impacting performance. - CDPD-28904: Intermittent Hive JDBC SSO failures in virtual environments
- Browser-based SSO with the Hive JDBC driver fails in virtual environments (like Windows VMs). The driver sometimes misses POST requests with the SAML token due to a race condition, causing authentication failures.
- CDPD-43672: Remove unnecessary optimizations in canHandleQbForCbo
- The
canHandleQbForCbo()
includes an optimization where it returns an empty string ifINFO
logging is disabled, which complicates the logic and doesn't significantly impact performance. - DWX-7648: Infinite loop during CBO parsing cause OOM in HiveServer2
- HiveServer became unstable due to an infinite loop during query
parsing with
UNION
operations, causing an out-of-memory error the during cost-based and logical optimization phase. The issue occurred because Hive's custom metadata provider was not initialized. - CDPD-31200: Reader not closed after check in AcidUtils, leading to resource exhaustion
- The Reader in
AcidUtils.isRawFormatFile
is not being closed after the check is finished. This causes issues when resources on the DFSClient are limited, leading to connection pool timeouts such asTimeout waiting for connection from pool
. - DWX-10336: SSL certificate import error in HiveServer2 with JWT authentication
- JWT support for HiveServer, SSL certificate import fails due to self-signed certificates not being accepted by the JVM in environments. The error occurs during the initialization of the HTTP server.
- CDPD-42686: Query-based compaction fails for tables with complex data types and reserved keywords
- Query-based compaction fails on tables with complex data types and columns containing reserved keywords due to incorrect quoting of column names when creating a temporary table.
- CDPD-54605: HiveSchemaTool to honor metastore-site.xml for initializing metastore schema
- The HiveSchemaTool fails to recognize the
metastore-site.xml
configuration when initializing the metastore schema. It defaults to using an embedded database instead of the specified MySQL database. - CDPD-55135: JDBC data connector queries to avoid exceptions at CBO stage
- JDBC data connector queries throw exceptions at the CBO stage due
to incorrect handling of database, schema and table names. When querying, the database
name is improperly swapped with the schema name, leading to error:
schema dev does not exist
- DWX-8296: Hive query vertex failure due to Kerberos authentication error
- Hive queries fail during LLAP execution with vertex failures. The query-executor fails to communicate with the query-coordinator due to an authentication error in Kerberos.
- CDPD-45607: Atomic schema upgrades for HMS to prevent partial commits
- SchemaTool may leave the metastore in an invalid state during schema upgrades because each change is autocommitted. If an upgrade fails mid-process, the schema is left partially updated, causing issues on reruns.
- CDPD-49232: Auto-reconnect data connectors after timeout
- When data connectors remain idle for long, the JDBC connection times out. This requires a restart to re-establish the connection, rendering the connector unusable until then.
- CDPD-49494: Allow JWT and LDAP authentication to co-exist in HiveServer2 configuration
- Setting hive.server2.authentication=JWT,LDAP fails with a validation error, preventing HiveServer2 from starting due to conflicts between authentication types.
- CDPD-40732: Timestamps when reading parquet files with Vectorized reader
- Timestamp shifts occur when reading Parquet files that were created in older Hive versions and vectorized execution is enabled. The vectorized reader is not able to exploit the metadata inside the Parquet file to apply the correct conversion. For instance, a timestamp written as 1880-01-01 00:00:00 may be read as 1879-12-31 23:52:58; the exact shift depends on the JVM timezone. The non-vectorized reader is not affected.