Review the list of Hive issues that are resolved in Cloudera Runtime
7.3.1.
Cloudera Runtime 7.3.1
- CDPD-13406: Disable TopN in ReduceSinkOp when TopNKey is
introduced
- 7.3.1
-
When both the ReduceSink
and TopNKey
operators are
used together in a query, they both perform Top-N key filtering. This results in the
same filtering logic being applied twice, causing slower query execution in query
execution.
- The Top-N key filtering logic within the
ReduceSink
operator is now disabled when the TopNKey
operator is introduced. The patch ensures that only the TopNKey
operator handles the Top-N filtering, while the other functionalities of the
ReduceSink
operator remain unaffected.Apache Jira:
HIVE-23736
- CDPD-28339: Skip extra work in Cleaner when queue is empty
- 7.3.1
- The Cleaner previously made unnecessary database calls and
logged activities even when there were no candidates for cleaning.
- This was optimized by skipping the extra DB calls and logging
when the cleaning queue is empty, improving performance.
Apache Jira:
HIVE-24754
- CDPD-28174: Compaction task reattempt fails due to
FileAlreadyExistsException
- 7.3.1
- The issue arises when a compaction task is relaunched after the
first attempt fails, leaving behind temporary directories. The second attempt encounters
a
FileAlreadyExistsException
because the _tmp
directory created during the first attempt was not cleared.
- The solution ensures that compaction reattempts clear the old
files from previous attempts before starting, preventing the failure caused by stale
directories.
Apache Jira:
HIVE-24882, HIVE-23058
- CDPD-45285: Incorrect results for IN UDF on Parquet columns of
CHAR/VARCHAR type
- 7.3.1
- Queries with case statements and multiple conditions return
incorrect results for tables in Parquet format, particularly with
CHAR/VARCHAR
types. The issue is not observed with ORC or TextFile
formats and can be bypassed by setting hive.optimize.point.lookup
to false
.
- The issue was addressed by adding the necessary CASTs during
IN
clause conversion.Apache Jira:
HIVE-26320
- CDPD-24412: Compaction queue entries stuck in 'ready for
cleaning' state
- 7.3.1
- When multiple compaction tasks run simultaneously on the same
table, only one task removes obsolete files while others remain in the
ready for
cleaning
state, leading to an accumulation of queue entries.
- Add a mechanism to automatically clear or re-evaluate entries
stuck in the
ready for cleaning
state to improve compaction task
efficiency.Apache Jira:
HIVE-25115
- CDPD-27291: getCrossReference fails when retrieving constraints
from the primary key side
- 7.3.1
- When retrieving constraints from the primary key side, the
foreign key is passed as null, causing the operation to fail with a
Db name
cannot be null
exception, especially when the metadata cache is enabled by
default.
- This has been resolved by ensuring that the foreign key
constraint is correctly handled even when passed as null during constraint retrieval
from the primary key side.
- CDPD-15269: Add caching support for frequently called constraint
APIs in catalogd's HMS interface
- 7.3.1
- The
get_unique_constraints
,
get_primary_keys
, get_foreign_keys
, and
get_not_null_constraints
APIs are called frequently during query
compilation, particularly with TPCDS queries. Without caching, this leads to performance
overhead.
- Introduced caching for the above APIs in the Catalogd’s HMS
interface by adding ValidWriteIdList and tableId to the API requests. This ensures that
the cache or backing DB is appropriately used to serve responses.
Apache Jira:
HIVE-23931
- DWX-8663: ShuffleScheduler should report the original exception
when shuffle becomes unhealthy
- 7.3.1
- The
ShuffleScheduler
does not report the
original exception when the shuffle becomes unhealthy, making it harder to diagnose the
underlying issue.
- This issue is now fixed.
Apache Jira:
TEZ-4342
- CDPD-43837: MSSQL upgrade scripts fail when adding TYPE column
to DBS table
- 7.3.1
- Schema upgrade for MSSQL fails with an error when trying to add
the
TYPE
column to the DBS
table due to the incorrect
usage of the keyword NATIVE
in the default value.
- The issue was addressed by modifying the schema upgrade script
to use a valid constant expression for the default value in MSSQL.
Apache Jira:
HIVE-25551
- CDPD-43890: Drop data connector if not exists should not throw
an exception
- 7.3.1
- The
DROP DATA CONNECTOR IF NOT EXISTS
command
incorrectly throws a NoSuchObjectException
when the connector does not
exist.
- The issue was addressed by ensuring that no exception is thrown
if the
ifNotExists
flag is true during the drop operation.Apache
Jira:
HIVE-26299
- CDPD-43838: Filter out results for show connectors in Hive
Metastore client side
- 7.3.1
- The
SHOW CONNECTORS
command does not filter
results based on authorization, such as Ranger policies, on the client side.
- The issue was addressed by implementing client-side filtering in
HMS to ensure that only connectors authorized by policies like Ranger are
displayed.
Apache Jira:
HIVE-26246
- CDPD-43952: HMS get_all_tables method does not retrieve tables
from remote database
- 7.3.1
- The
get_all_tables method
in Hive Metastore
handler only retrieves tables from the native database, unlike the get_tables
method
, which can retrieve tables from both native and remote databases.
- The issue was addressed by updating the
get_all_tables
method to retrieve tables from both native and remote
databases, ensuring consistency with the get_tables method.Apache Jira:
HIVE-26171
- CDPD-55914: Select query on table with remote database returns
NULL values with postgreSQL and Redshift data connectors
- 7.3.1
- Few datatypes are not mapped from Postgres or Redshift to Hive
data types in the connector, which resulted in displaying null values for the columns of
those data types.
- This issue is fixed.
Apache Jira:
HIVE-27316
- CDPD-31726: Prevent NullPointerException by Checking Collations
Return Value
- 7.3.1
- A
NullPointerException
occurs during execution
of an EXPLAIN cbo on a subquery when using Tez as the execution engine, leading to empty
explain output.
- Added a check for null return values from
RelMetadataQuery.collations() to prevent
NullPointerExceptions
in
RelFieldTrimmer and HiveJoin, ensuring stability during query execution.Apache
Jira:
HIVE-25749
- CDPD-27418: Incorrect row order after query-based MINOR
compaction
- 7.3.1
- The query-based
MINOR
compaction used an
incorrect sorting order, which led to duplicated rows after multiple merge
statements.
- The sorting order was corrected, ensuring proper row
handling.
Apache Jira:
HIVE-25258
- CDPD-27419: Incorrect row order validation for query-based major
compaction
- 7.3.1
- The row order validation for query-based
MAJOR
compaction incorrectly checked the order as bucketProperty, leading to failures with
multiple bucketProperties.
- The validation was updated to correctly check the order as
originalTransactionId, bucketProperty, and rowId, and an improved error message was
implemented.
Error: org.apache.hadoop.hive.ql.metadata.HiveException: Wrong sort order of Acid rows detected for the rows
Apache Jira:
HIVE-25257
- Enable proper handling of non-default schemas in Hive for JDBC
databases
- 7.3.1
- Hive fails to create an external table for a JDBC database when
the table is in a non-default schema, causing
PSQLException
error that
the table does not exist.
- Improved handling of tables in non-default schemas by correctly
using the hive.sql.schema property. This ensures the table is
found, preventing the error.
Apache Jira:
HIVE-25591
- CBO failure when using JDBC table with password through
dbcp.password.uri
- 7.3.1
- When a table is created using
JDBCStorageHandler
and the JDBC_PASSWORD_URI
is
specified for the password, the Cost-Based Optimizer (CBO) fails. This causes all the
data to be fetched directly from the database and processed in Hive, impacting
performance.
- Adjustments were made to ensure CBO functions correctly when
JDBC_PASSWORD_URI
is used, allowing for proper optimization and
preventing unnecessary data fetch from the database.Apache Jira:
HIVE-25626
- CDPD-28904: Intermittent Hive JDBC SSO failures in virtual
environments
- 7.3.1
- Browser-based SSO with the Hive JDBC driver fails in virtual
environments (like Windows VMs). The driver sometimes misses POST requests with the SAML
token due to a race condition, causing authentication failures.
- Resolved a race condition in the JDBC driver to ensure it
properly handles SSO authentication in virtual environments, preventing POST request
failures.
Apache Jira:
HIVE-25479
- CDPD-43672: Remove unnecessary optimizations in
canHandleQbForCbo
- 7.3.1
- The
canHandleQbForCbo()
includes an
optimization where it returns an empty string if INFO
logging is
disabled, which complicates the logic and doesn't significantly impact performance.
- The issue was addressed by simplifying the code in
canHandleQbForCbo()
and removing the unnecessary optimization related
to logging.Apache Jira:
HIVE-26438
- DWX-7648: Infinite loop during CBO parsing cause OOM in
HiveServer2
- 7.3.1
- HiveServer became unstable due to an infinite loop during query
parsing with
UNION
operations, causing an out-of-memory error the
during cost-based and logical optimization phase. The issue occurred because Hive's
custom metadata provider was not initialized.
- The initialization has now been moved before CBO requires
it.
Apache Jira:
HIVE-25220
- CDPD-31200: Reader not closed after check in AcidUtils, leading
to resource exhaustion
- 7.3.1
- The Reader in
AcidUtils.isRawFormatFile
is not
being closed after the check is finished. This causes issues when resources on the
DFSClient are limited, leading to connection pool timeouts such as Timeout
waiting for connection from pool
.
- The fix includes automatically closing the Reader in
AcidUtils.isRawFormatFile
, which ensures that resources are freed up
and prevents connection pool timeout issues.Apache Jira:
HIVE-25683
- DWX-10336: SSL certificate import error in HiveServer2 with JWT
authentication
- 7.3.1
- JWT support for HiveServer, SSL certificate import fails due to
self-signed certificates not being accepted by the JVM in environments. The error occurs
during the initialization of the HTTP server.
- The fix includes introducing a property to disable SSL
certificate verification for downloading JWKS (JSON Web Key Set) in environments. This
helps users bypass certificate validation.
Apache Jira:
HIVE-26425
- CDPD-42686: Query-based compaction fails for tables with complex
data types and reserved keywords
- 7.3.1
- Query-based compaction fails on tables with complex data types
and columns containing reserved keywords due to incorrect quoting of column names when
creating a temporary table.
- The issue was addressed by ensuring that columns with reserved
keywords are correctly quoted during the creation of temporary tables.
Apache
Jira:
HIVE-26374
- CDPD-54605: HiveSchemaTool to honor metastore-site.xml for
initializing metastore schema
- 7.3.1
- The HiveSchemaTool fails to recognize the
metastore-site.xml
configuration when initializing the metastore
schema. It defaults to using an embedded database instead of the specified MySQL
database.
- The issue was addressed by updating the HiveSchemaTool to ensure
it properly reads the
metastore-site.xml
file, allowing for correct
initialization of the metastore schema with the intended database
configuration.Apache Jira:
HIVE-26402
- CDPD-55135: JDBC data connector queries to avoid exceptions at
CBO stage
- 7.3.1
- JDBC data connector queries throw exceptions at the CBO stage
due to incorrect handling of database, schema and table names. When querying, the
database name is improperly swapped with the schema name, leading to error:
schema dev does not exist
- The issue was addressed by changing the
hive.sql.table property value from databasename.tablename to
tablename and adding hive.sql.table property with databasename.
This adjustment ensures that the CBO stage retrieves JDBC table information correctly,
eliminating the errors related to schema and table name resolution.
Apache
Jira:
HIVE-26192
- DWX-8296: Hive query vertex failure due to Kerberos
authentication error
- 7.3.1
- Hive queries fail during LLAP execution with vertex failures.
The query-executor fails to communicate with the query-coordinator due to an
authentication error in Kerberos.
- Address the Kerberos authentication failure between
query-executors and the query-coordinator to ensure proper task execution and prevent
vertex failures during LLAP execution.
- CDPD-45607: Atomic schema upgrades for HMS to prevent partial
commits
- 7.3.1
- SchemaTool may leave the metastore in an invalid state during
schema upgrades because each change is autocommitted. If an upgrade fails mid-process,
the schema is left partially updated, causing issues on reruns.
- The issue was addressed by ensuring schema changes are committed
only after the entire upgrade process completes successfully. If any step fails, no
changes are applied, preventing partial updates and keeping the schema
intact.
Apache Jira:
HIVE-25707
- CDPD-49232: Auto-reconnect data connectors after timeout
- 7.3.1
- When data connectors remain idle for long, the JDBC connection
times out. This requires a restart to re-establish the connection, rendering the
connector unusable until then.
- The issue was addressed by automatically checking if a
connection is closed and re-establishing it when necessary. This ensures the connectors
stay functional without needing a restart, and includes setting connection timeout and
retry properties for more reliable reconnections.
Apache Jira:
HIVE-26045
- CDPD-49494: Allow JWT and LDAP authentication to co-exist in
HiveServer2 configuration
- 7.3.1
- Setting
hive.server2.authentication=JWT,LDAP fails with a validation
error, preventing HiveServer2 from starting due to conflicts between authentication
types.
- The issue was addressed by updating the validation logic to
support JWT authentication alongside LDAP, ensuring HiveServer2 can start with both auth
mechanisms enabled.
Apache Jira:
HIVE-26045
- CDPD-40732: Timestamps when reading parquet files with
Vectorized reader
- 7.3.1
- Timestamp shifts occur when reading Parquet files that were
created in older Hive versions and vectorized execution is enabled. The vectorized
reader is not able to exploit the metadata inside the Parquet file to apply the correct
conversion. For instance, a timestamp written as 1880-01-01 00:00:00 may be read as
1879-12-31 23:52:58; the exact shift depends on the JVM timezone. The non-vectorized
reader is not affected.
- The fix ensures both vectorized and non-vectorized readers use
the same logic to determine the correct timestamp conversion based on metadata and
configuration.
Apache Jira:
HIVE-26270