Fixed issues
Review the fixed issues in this release of the Cloudera Data Warehouse service on cloud.
- CDPD-89414: Incorrect results for window functions with IGNORE NULLS
- When you used the FIRST_VALUE and LAST_VALUE window functions with the IGNORE NULLS clause while vectorization was enabled, the results were incorrect. This occurred because the vectorized execution engine did not properly handle the IGNORE NULLS setting for these functions.
- CDPD-60770: Passwords with special characters fail to connect with Beeline
- When you used a password containing special characters like #, ^, or ; in a JDBC URL for a Beeline connection, the connection failed with a 401 error. This happened because Beeline did not correctly interpret these special characters in the password.
- CDPD-85600: Select queries with ORDER BY fail due to compression error
- When you ran a Hive SELECT query with an ORDER BY clause, it failed with a java.io.IOException and java.lang.UnsatisfiedLinkError related to the zlib decompressor.
- CDPD-90301: Stack overflow error from queries with OR and MIN filters
- Queries, cause a stack overflow error when they contained multiple
OR conditions on the same expression, such as
MINUTE(date_) = 2 OR MINUTE(date_) = 10. - CDPD-90303: Incorrect results from a CASE expression
- A query that used a CASE expression to conditionally return values produced an incorrect result. The query plan incorrectly folded the CASE statement into a COALESCE function, which led to a logic error that filtered out some of the expected results.
- CDPD-80655: Compile error with ambiguous column reference
- A Hive query using CREATE TABLE AS SELECT with a GROUP BY clause and a window function failed with an "Ambiguous column reference" error. This happened because the query plan couldn't correctly handle redundant keys in the GROUP BY clause.
- DWX-20754: Invalid column reference in lateral view queries
- The virtual column
BLOCK__OFFSET__INSIDE__FILEfails to be correctly referenced in queries using lateral views, resulting in the error:FAILED: SemanticException Line 0:-1 Invalid column reference 'BLOCK_OFFSET_INSIDE_FILE.This issue is now resolved.
Apache Jira:HIVE-28938
- DWX-21855: Impala Executors fail to gracefully shutdown
- During graceful shutdown Impala executors wait for running queries
to finish up to the graceful shutdown deadline (
--shutdown_deadline_s). During graceful shutdown the istio-proxy container on Impala executor pod was getting terminated immediately and as a result the executors were not reachable and were removed from the Impala cluster membership resulting in cancellation of running queries. - IMPALA-14263: Enhanced join strategy for large clusters
- The query planner's cost model for broadcast joins can be skewed by the number of nodes in a cluster. This could lead to suboptimal join strategy choices, especially in large clusters with skewed data where a partitioned join was chosen over a more efficient broadcast join.
- IMPALA-11402: Fetching metadata for tables with huge numbers of files no longer fails with OutOfMemoryError
- Previously, when Impala Coordinator tried to fetch file metadata for extremely large tables (those with millions of files or partitions), the Impala Catalog service would attempt to return all the file details at once. This often exceeded the Java memory limits, causing the service to crash with an
OutOfMemoryError. - CDPD-83031: Client connections are now more stable thanks to enabled keepalive
- Previously, TCP keepalive was not active by default for client connections. This caused problems, especially in environments that use a load balancer (a tool that manages network traffic).
- CDPD-77261: Impala can now read Parquet integer data as DECIMAL after schema changes
- Previously, if you changed a column type from an integer
(
INTorBIGINT) to aDECIMALusing ALTER TABLE, Impala could fail to read the original Parquet data files. This happened because the files lacked the specific metadata (logical types) Impala expected for decimals, resulting in an error. - IMPALA-12927: Impala can now correctly read BINARY columns in JSON tables
- Previously, Impala couldn't correctly read
BINARYcolumns in JSON tables, often resulting in errors or incorrect data. This happened because Impala assumed the data was always Base64 encoded, which wasn't true for files written by older Hive versions. - IMPALA-13631: Impala cluster responsiveness during table renames
- Performing ALTER TABLE RENAME operations caused Impala to hold a critical internal lock for too long, which blocks other DDL/DMLs.
- Catalogd and Event Processor Improvements
-
- Faster Inserts for Partitioned Tables (IMPALA-14051): Inserting data into very large partitioned tables is now much faster. Previously, Impala communicated with the Hive Metastore (HMS) one partition at a time, which was a major slowdown. Impala now uses the batch insert API to send all insert information to the HMS in one highly efficient call, significantly boosting the performance of your INSERT statements into transactional tables.
- Quicker Table Administration (IMPALA-13599): Administrative tasks, such as running
DROP STATS or changing the
CACHEDstatus of a table, are now much faster on tables with many partitions. Impala previously made thousands of individual calls to the HMS for these operations. The system now batches these updates, making far fewer calls to the HMS and speeding up these essential administrative commands. - Reliable Table Renames (IMPALA-13989): The ALTER TABLE RENAME command no longer fails when an INVALIDATE METADATA command runs at the same time. Previously, this caused the rename to succeed in the Hive Metastore but fail in Impala's Catalog Server. Impala now includes automatic error handling that instantly runs an internal metadata refresh if the rename is interrupted, ensuring the rename completes successfully without requiring any manual user steps.
- Efficient Partition Refreshes (IMPALA-13453): Running REFRESH <table> PARTITION <partition> is now much more efficient. Previously, this command always fully reloaded the partition's metadata and column statistics, even if the partition was unchanged. Impala now checks if the partition data has changed before reloading, avoiding the unnecessary drop-add sequence and significantly improving the efficiency of partition metadata updates.
- Reduced Partition API Calls (IMPALA-13599): Impala has reduced unnecessary API
interactions with the HMS during table-level operations. Commands like ALTER
TABLE... SET CACHED/UNCACHED or DROP STATS on large
tables previously generated thousands of single
alter_partition()calls. Impala now utilizes the HMS's bulk-update functionality, batching these partition updates to drastically reduce the total number of required API calls. - REFRESH on multiple partitions (IMPALA-14089): Impala now supports using the REFRESH statement on multiple partitions within a single command, which significantly speeds up metadata updates by processing partitions in parallel, reduces lock contention in the Catalog service, and avoids unnecessary increases to the table version. See Impala REFRESH Statement
Apache Jira: IMPALA-14051, IMPALA-13599, IMPALA-13989, IMPALA-13453,IMPALA-14089
- CDPD-81076:
LEFT ANTI JOINfails on Iceberg V2 tables with Delete files - Queries using a
LEFT ANTI JOINfail with anAnalysisExceptionif the right-side table is an Iceberg V2 table containing delete files. For example, consider the following query:SELECT * FROM table_a a LEFT ANTI JOIN iceberg_v2_table b ON a.id = b.id;The error
Illegal column/field reference'b.input_file_name' of semi-/anti-joined table 'b'is displayed because semi-joined tuples need to be explicitly made visible for paths pointing inside them to be resolvable.The fix updates the
IcebergScanPlannerto ensure that the tuple containing the virtual fields is made visible when it is semi-joined.Apache Jira: IMPALA-13888
- CDPD-81053: Enable MERGE statement for Iceberg tables with equality deletes
- This patch fixes an issue that caused
MERGEstatements to fail on Iceberg tables that use equality deletes.The failure occurred because the delete expression calculation was missing the data sequence number, even though the underlying data description included it. This mismatch caused row evaluation to fail.
The fix ensures the data sequence number is correctly included in the result expressions, allowing
MERGEoperations to complete successfully on these tables.Apache Jira: IMPALA-13674
- CDPD-77773: Tolerate missing data files during Iceberg table loading
- This fix addresses an issue where an Iceberg table would fail to
load completely if any of its data files were missing from the file system. This
TableLoadingExceptionleft the table in an incomplete state, blocking all operations on it.Impala now tolerates missing data files during the table loading process. An exception will only be thrown if a query subsequently attempts to read one of the specific files that is missing.
This change allows other operations that do not depend on the missing data—such as
ROLLBACK,DROP PARTITION, orSELECTstatements on valid partitions—to execute successfully.Apache Jira: IMPALA-13654
- CDPD-78508: Skip reloading Iceberg tables when metadata JSON file is the same
- This patch optimizes metadata handling for Iceberg tables,
particularly those that are updated frequently.
Previously, if an event processor was lagging, Impala might receive numerous update events for the same table (for example, 100 events). Impala would attempt to reload the table 100 times, even if the table's state was already up-to-date after processing the first event.
With this fix, Impala now compares the path of the incoming metadata JSON file with the one that is currently loaded. If the metadata file location is the same, Impala skips the reload, correctly assuming the table is already unchanged. This significantly reduces unnecessary metadata processing.
Apache Jira: IMPALA-13718
Fixed Common Vulnerabilities and Exposures
Common Vulnerabilities and Exposures (CVE) that are fixed in this release:
| CVE | Description |
|---|---|
| CVE-2025-30065 | Code execution vulnerability in schema parsing of Apache Parquet-avro module in versions lower than 1.15.1. |
| CVE-2020-20703 | Buffer overflow vulnerability in VIM v.8.1.2135 allows a remote attacker to execute arbitrary code using the operand parameter. |
| CVE-2024-53990 | Cookie handling vulnerability in AsyncHttpClient (AHC) library leading to cross-user cookie misuse. |
| CVE-2024-52533 | Buffer overflow vulnerability in GNOME GLib SOCKS4 proxy handling (gio/gsocks4aproxy.c). |
| CVE-2024-52046 | Apache MINA ObjectSerializationDecoder vulnerability leading to Remote Code Execution (RCE). |
| CVE-2017-6519 | Avahi-daemon IPv6 unicast query handling vulnerability leading to DoS and information leakage. |
