Fixed Issues in Impala

Fixed issues for Impala are addressed in Cloudera Runtime 7.3.2, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.2

Cloudera Runtime 7.3.2 resolves Impala issues and incorporates fixes from the service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a comprehensive record of all fixes in Cloudera Runtime 7.3.1.x, see Fixed Issues.

CDPD-98207: Impala crashing on the Web UI for failed queries
7.3.2
Previously, Impala crashed when you accessed the query summary or JSON plan through the Web UI for queries that failed before execution. This occurred during scenarios such as a Create Table As Select (CTAS) failure or when admission control rejected a query.
This issue is addressed by ensuring the system correctly handles missing execution summaries. This issue is now fixed.

Apache Jira: IMPALA-14791

CDPD-97786: Excessive partition events during table-level operations
7.3.2
Previously, certain table-level operations, such as dropping incremental statistics or setting/unsetting cached properties, triggered an individual ALTER PARTITION event for every partition in a table.
This issue is addressed by implementing bulk updates for partitions.

Apache Jira: IMPALA-13599

CDPD-97187: Deprecation warnings in impala-shell with Python 3.11
7.3.2
Previously, when running impala-shell by using Python 3.11 or newer, the command output displayed DeprecationWarning messages related to ssl.PROTOCOL_TLS and ssl.match_hostname(). These warnings were triggered by underlying library dependencies.
This issue is now resolved by updating the handling of SSL protocols and validating logic to be compatible with newer Python versions, which eliminates these warning messages from the shell output.

Apache Jira: IMPALA-12219

CDPD-91994: Stale query IDs in catalog logs
7.3.2
Previously, catalog logs for getPartialCatalogObject certain metadata requests displayed incorrect query IDs.
This issue is addressed by ensuring that each request is associated with its correct query ID. The system now automatically clears the identification after the request finishes to prevent stale information from appearing in later logs.

Apache Jira: IMPALA-14494

CDPD-82673: RSASSA-PSS certificate signature schema is now supported for server certificates
7.3.2
Previously, if you used a certificate with the RSASSA-PSS signature algorithm for kRPC communication, the connection failed.
The fix includes an updated OpenSSL function that correctly identifies the hash algorithm for RSASSA-PSS certificates.

Apache Jira: IMPALA-14038

CDPD-89852: Crash when casting timestamp strings with timezone offsets to DATE
7.3.2
Attempting to cast a timestamp string that included a timezone offset (like "+08:00" in "2025-08-31 06:23:24.9392129 +08:00" ) to the DATE data type would cause a crash.
This issue is addressed by adding a check to ensure that the timestamp string length does not exceed the maximum length of the default date-time format. Longer strings will now use a lazily-created format, which prevents the crash.

Apache Jira: IMPALA-14383

CDPD-89730: Impala daemon crashed during scans with high logging levels
7.3.2
Previously, the Impala daemon experienced a null pointer dereference in the BaseSequenceScanner component when the logging level was set to 2 or higher, leading to crashes in release builds.
This issue is resolved by correcting the pointer handling in the sequence scanner to ensure safe memory access when high-level logging is active.

Apache Jira: IMPALA-14382

CDPD-89346: Enhanced join strategy selection for large clusters
7.3.2
The query planner's cost model for broadcast joins can be skewed by the number of nodes in a cluster. This lead to suboptimal join strategy choices, especially in large clusters with skewed data where a partitioned join was chosen over a more efficient broadcast join.
This issue is now resolved by introducing the broadcast_cost_scale_factor query option as an additional tuning option besides query hint to override query planner decision.

Apache Jira: IMPALA-14263

CDPD-89132: Tables incorrectly dropped by stale HMS events after global metadata invalidation
7.3.2
Previously, a stale event such as DropTable or AlterTableRename post global INVALIDATE METADATA command could cause tables to be unintentionally dropped
This issue is resolved by tracking the createEventId as the current HMS event ID for all tables during a global reset.

Apache Jira: IMPALA-14330

CDPD-79111: Authentication failure in impala-shell with 76 character LDAP passwords
7.3.2
Previously, when you used impala-shell with the HS2-HTTP protocol and a 76 character LDAP password, the connection failed with a value error.
This issue is resolved by an updated encoding method that handles long password strings without inserting line breaks, ensuring that the authorization header remains valid for the server.

Apache Jira: IMPALA-13746

CDPD-92001: Metadata loading performed sequentially in local catalog mode
7.3.2
Previously, when a query accessed multiple unloaded tables in local catalog mode, Impala triggered metadata loading for those tables sequentially.
This issue is resolved by parallelizing table loading during query compilation. A new startup flag, max_stmt_metadata_loader_threads, is introduced to control the number of threads used for loading metadata, with a default value of 8 threads per query. If only one table requires loading or if the thread pool is unavailable, the system automatically falls back to sequential loading.

IMPALA-14447

CDPD-79241: Incorrect query results for Iceberg V2 tables
7.3.2
Previously, when you ran complex queries involving multiple subqueries on Iceberg V2 tables, the system sometimes returned incorrect results.
This issue is now resolved. The fix includes a new internal mechanism to track and apply count optimizations.
Cookie-Based authentication support for JWT tokens
7.3.2
When JWT tokens are used for authentication, every HTTP request within a session requires token verification. If these tokens have a short lifespan, it can lead to authentication failures and disrupt session continuity.
This issue is now resolved by using authentication cookies, which generally have a longer lifespan (configured through the max_cookie_lifetime_s flagfile option) and can remain valid for the duration of the session. This enables subsequent authentication requests to rely on cookies rather than repeatedly verifying the JWT token.

Apache Jira: IMPALA-13813

CDPD-80798: Stable Catalogd initialization in HA mode
7.3.2
Catalogd initialization previously might timeout to complete in high availability mode. This happened because metadata operations started prematurely, blocking Catalogd from becoming active.
This issue is resolved by ensuring Catalogd determines HA state before starting metadata operations in HA mode. This prevents blocking issues and ensures a stable startup.

Apache Jira: IMPALA-13850

CDPD-83059: Optimized Impala Catalog cache warmup
7.3.2
Impala's Catalogd previously started with an empty cache. This led to slow query startup for important tables and affected high availability failovers.
This issue is resolved by adding new settings to pre-load specific tables into the Catalogd cache in the background. This ensures faster query startup and smoother high availability failovers.

Apache Jira: IMPALA-14074

CDPD-87222: Consistent TRUNCATE operations for external tables
7.3.2
Impala's TRUNCATE operations on external tables previously did not consistently delete files in subdirectories, even when recursive listing was enabled.
This issue is resolved by ensuring Impala uses the HMS API for TRUNCATE operations by default.

Apache Impala: IMPALA-14189, IMPALA-14224

DWX-21855: Impala Executors fail to gracefully shutdown
7.3.2
During graceful shutdown Impala executors wait for running queries to finish up to the graceful shutdown deadline (--shutdown_deadline_s). During graceful shutdown the istio-proxy container on Impala executor pod was getting terminated immediately and as a result the executors were not reachable and were removed from the Impala cluster membership resulting in cancellation of running queries.
This issue is now resolved by making sure istio-proxy container’s lifecycle doesn’t impact executor’s cluster membership.
IMPALA-14263: Enhanced join strategy for large clusters
7.3.2
The query planner's cost model for broadcast joins can be skewed by the number of nodes in a cluster. This could lead to suboptimal join strategy choices, especially in large clusters with skewed data where a partitioned join was chosen over a more efficient broadcast join.
This issue is now resolved by introducing the broadcast_cost_scale_factor query option as an additional tuning option besides query hint to override query planner decision. To set it cluster-wide for all queries, add the following key-value to the default_query_options startup option:
broadcast_cost_scale_factor=<less than 1.0>

Apache Jira: IMPALA-14263

IMPALA-11402: Fetching metadata for tables with huge numbers of files no longer fails with OutOfMemoryError
7.3.2
Previously, when Impala Coordinator tried to fetch file metadata for extremely large tables (those with millions of files or partitions), the Impala Catalog service would attempt to return all the file details at once. This often exceeded the Java memory limits, causing the service to crash with an OutOfMemoryError.
This issue is addressed by configuring the Catalog service to limit the number of file descriptors included in a single getPartialCatalogObject response. A new configuration flag, catalog_partial_fetch_max_files, is introduced to define the maximum number of file descriptors allowed per response (with a default of 1,000,000 files).
If a request exceeds this limit, the Catalog service will truncate the response and return metadata for only a subset of the requested partitions. The coordinator is now designed to detect this truncated response and automatically send new batch requests to fetch the remaining partitions until all required metadata is retrieved. This change ensures that the coordinator can successfully fetch and process the metadata for extremely large tables without crashing due to memory limits.

Apache Jira: IMPALA-11402

CDPD-77261: Impala can now read Parquet integer data as DECIMAL after schema changes
7.3.2
Previously, if you changed a column type from an integer (INT or BIGINT) to a DECIMAL using ALTER TABLE, Impala could fail to read the original Parquet data files. This happened because the files lacked the specific metadata (logical types) Impala expected for decimals, resulting in an error.
Impala is now more flexible when reading Parquet files following schema evolution. If Impala encounters an integer type but the schema expects a DECIMAL, it automatically assumes a suitable decimal precision and scale, allowing you to successfully query the updated table:
  • INT32 is read as DECIMAL(9, 0).
  • INT64 is read as DECIMAL(18, 0).
This change supports common schema evolution practices by allowing you to update column types without manually rewriting old data files.

Apache Jira: IMPALA-13625

IMPALA-12927: Impala can now correctly read BINARY columns in JSON tables
7.3.2
Previously, Impala couldn't correctly read BINARY columns in JSON tables, often resulting in errors or incorrect data. This happened because Impala assumed the data was always Base64 encoded, which wasn't true for files written by older Hive versions.
Impala now supports a new table property, 'json.binary.format' (BASE64 or RAWSTRING), and a query option, JSON_BINARY_FORMAT, to explicitly define the binary encoding. This ensures Impala reads the data correctly. If no format is specified, Impala will now return an error instead of risking silent data corruption.

JIRA Issue: IMPALA-12927