Behavioral changes denote a marked change in behavior from the previously released
version to this version of Apache Impala.
Behavioral Changes in Cloudera Runtime 7.3.1.500 SP3
- Summary:
- Simplified Impala INSERT performance analysis
- Previous behavior:
- Query profiles for INSERT operations displayed a
long and misleading duration for 'Got Metastore client' right before 'Fired Metastore
events'.
- New behavior:
- Now, the catalog timeline accurately includes a dedicated item for
'Prepared InsertEvent data' within the 'Catalog Server Operation' section of Impala query
profiles. This allows users to clearly see the time spent collecting file checksums during
INSERT operations, providing more precise performance insights.Apache Jira: IMPALA-13960
- Summary:
- Optimized ephemeral storage for Impala's Catalogd
- Previous behavior:
- The ephemeral storage limit of 512MB often caused container evictions
due to insufficient space, especially during JVM heap dumps.
- New behavior:
- The ephemeral storage limit for Catalogd is now set dynamically based
on the JVM heap size (
Xmx), calculated as
2 * Xmx + Buffer space
This approach ensures
adequate storage for heap dumps and minimizes the risk of container eviction, resulting in
improved stability for Catalogd and other services sharing node resources.
- Summary:
- Cleanup subdirectories in truncate/insert
overwrite if recursing listing is enabled
- Previous behavior:
- Impala did not consistently delete files located in subdirectories of
external tables during TRUNCATE and INSERT OVERWRITE
operations, even when recursive listing was enabled. This led to leftover data in
subdirectories after these operations, resulting in data corruption.
- New behavior:
- Now, directories are also deleted in addition to (non-hidden) data
files, with the exception of hidden and ignored directories. Also, setting
DELETE_STATS_IN_TRUNCATE=false is no longer supported by default when
truncating non-transactional tables; attempting this will result in an exception. If the old
behavior is absolutely required, you can set the
--truncate_external_tables_with_hms flag to false, but be aware that this
will also reintroduce the bug that was fixed by this change.Apache Impala: IMPALA-14189, IMPALA-14224
Behavioral Changes in Cloudera Runtime 7.3.1.400 SP2
There are no behavioral changes in this release.
Behavioral Changes in Cloudera Runtime 7.3.1.300 SP1 CHF 1
- Summary:
- Impala Query Analysis Behavior with Ranger.
- Previous behavior:
- Impala previously verified WRITE access for the service user on HDFS
table/partition(s) during query analysis of
INSERT and LOAD
DATA statements in legacy catalog mode. Permissions were computed based on HDFS
settings, including ACLs, when tables and partitions were instantiated.
- New behavior:
- To address performance concerns, HDFS permissions are now skipped
during query analysis. The service user is assumed to have
READ_WRITE access
to all HDFS paths associated with the target table when Ranger is enabled. Ranger policies
remain enforced during query execution for INSERT and LOAD
DATA statements, ensuring security compliance.Apache Jira: IMPALA-11871
- Summary:
- Expression rewrite behavior for Hive views with auto-generated column
aliases.
- Previous behavior:
- Impala attempted to simplify
CAST expressions for
all columns, including those with Hive auto-generated aliases (such as _c0), introduced by the
SimplifyCastExprRule optimization in IMPALA-10836. In views created in Hive
without explicit column aliases, this could lead to AnalysisException errors during query
execution. For example, a view using CAST on a column labeled as _c0 might
fail
with:AnalysisException: Could not resolve column/field reference:
'failing_view._c0'
- New behavior:
- Impala now skips rewriting expressions that are associated with Hive
auto-generated column aliases (for example, _c0, _c1, etc.). This preserves the correct column
mapping across nested views and avoids errors during query analysis. This change allows
queries to succeed without requiring you to explicitly rename columns in Hive
views.
Apache Jira: IMPALA-11871
Behavioral Changes in Cloudera Runtime 7.3.1.200 SP1
There are no behavioral changes in this release.
Behavioral Changes in Cloudera Runtime 7.3.1.100 CHF 1
There are no behavioral changes in this release.
Behavioral Changes in Cloudera Runtime 7.3.1
- Summary:
- Skips scheduling runtime filters for PK-FK joins when the build scan
has no predicate filter, the join is a full table scan, and the bloom filter has a high false
positive probability.
- Previous behavior:
- Runtime filters were scheduled for all PK-FK joins, regardless of
effectiveness.
- New behavior:
- Filters are skipped for PK-FK joins when the build scan is a full
table scan without filters, and the bloom filter has a high false positive probability,
improving performance. For more details see, Skip Scheduling Bloom Filter
Apache
Jira:
IMPALA-12357
- Summary:
- Skips LZ4 compression when sending row batches within the same
process to improve efficiency.
- Previous behavior:
- Row batches were serialized, compressed, sent through KRPC, and then
decompressed, even when the sender and receiver were in the same process.
- New behavior:
- LZ4 compression is skipped for row batches sent within the same
process, reducing unnecessary work and improving performance.
Apache Jira:
IMPALA-12430