Behavioral Changes in Impala

Behavioral changes denote a marked change in behavior from the previously released version to this version of Apache Impala.

Behavioral Changes in Cloudera Runtime 7.3.1.500 SP3

Summary:
Simplified Impala INSERT performance analysis
Previous behavior:
Query profiles for INSERT operations displayed a long and misleading duration for 'Got Metastore client' right before 'Fired Metastore events'.
New behavior:
Now, the catalog timeline accurately includes a dedicated item for 'Prepared InsertEvent data' within the 'Catalog Server Operation' section of Impala query profiles. This allows users to clearly see the time spent collecting file checksums during INSERT operations, providing more precise performance insights.

Apache Jira: IMPALA-13960

Summary:
Optimized ephemeral storage for Impala's Catalogd
Previous behavior:
The ephemeral storage limit of 512MB often caused container evictions due to insufficient space, especially during JVM heap dumps.
New behavior:
The ephemeral storage limit for Catalogd is now set dynamically based on the JVM heap size (Xmx), calculated as
2 * Xmx + Buffer space
This approach ensures adequate storage for heap dumps and minimizes the risk of container eviction, resulting in improved stability for Catalogd and other services sharing node resources.
Summary:
Cleanup subdirectories in truncate/insert overwrite if recursing listing is enabled
Previous behavior:
Impala did not consistently delete files located in subdirectories of external tables during TRUNCATE and INSERT OVERWRITE operations, even when recursive listing was enabled. This led to leftover data in subdirectories after these operations, resulting in data corruption.
New behavior:
Now, directories are also deleted in addition to (non-hidden) data files, with the exception of hidden and ignored directories. Also, setting DELETE_STATS_IN_TRUNCATE=false is no longer supported by default when truncating non-transactional tables; attempting this will result in an exception. If the old behavior is absolutely required, you can set the --truncate_external_tables_with_hms flag to false, but be aware that this will also reintroduce the bug that was fixed by this change.

Apache Impala: IMPALA-14189, IMPALA-14224

Behavioral Changes in Cloudera Runtime 7.3.1.400 SP2

There are no behavioral changes in this release.

Behavioral Changes in Cloudera Runtime 7.3.1.300 SP1 CHF 1

Summary:
Impala Query Analysis Behavior with Ranger.
Previous behavior:
Impala previously verified WRITE access for the service user on HDFS table/partition(s) during query analysis of INSERT and LOAD DATA statements in legacy catalog mode. Permissions were computed based on HDFS settings, including ACLs, when tables and partitions were instantiated.
New behavior:
To address performance concerns, HDFS permissions are now skipped during query analysis. The service user is assumed to have READ_WRITE access to all HDFS paths associated with the target table when Ranger is enabled. Ranger policies remain enforced during query execution for INSERT and LOAD DATA statements, ensuring security compliance.

Apache Jira: IMPALA-11871

Summary:
Expression rewrite behavior for Hive views with auto-generated column aliases.
Previous behavior:
Impala attempted to simplify CAST expressions for all columns, including those with Hive auto-generated aliases (such as _c0), introduced by the SimplifyCastExprRule optimization in IMPALA-10836. In views created in Hive without explicit column aliases, this could lead to AnalysisException errors during query execution. For example, a view using CAST on a column labeled as _c0 might fail with:
AnalysisException: Could not resolve column/field reference:
            'failing_view._c0'
New behavior:
Impala now skips rewriting expressions that are associated with Hive auto-generated column aliases (for example, _c0, _c1, etc.). This preserves the correct column mapping across nested views and avoids errors during query analysis. This change allows queries to succeed without requiring you to explicitly rename columns in Hive views.

Apache Jira: IMPALA-11871

Behavioral Changes in Cloudera Runtime 7.3.1.200 SP1

There are no behavioral changes in this release.

Behavioral Changes in Cloudera Runtime 7.3.1.100 CHF 1

There are no behavioral changes in this release.

Behavioral Changes in Cloudera Runtime 7.3.1

Summary:
Impala now unregisters timed-out queries promptly to free memory, retaining error messages for clients that return later.
Previous behavior:
Timed-out queries remained registered until the session closed, keeping memory occupied and sometimes leaving failed queries in an active state if not explicitly closed.
New behavior:
Timed-out queries are unregistered immediately to free memory, while error messages are kept in a new structure so clients can still receive an error message if they return later.

Apache Jira: IMPALA-12602