What's New in Apache Impala
New features and functional updates for Impala are introduced in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.
Cloudera Runtime 7.3.2
Cloudera Runtime 7.3.2 introduces new features of Impala and includes all service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a comprehensive record of all updates in Cloudera Runtime 7.3.1.x, see New Features.
- Hierarchical metastore event processing (Preview)
- Impala now supports a multi-layered, hierarchical approach to metastore event
processing to improve synchronization speed and handle event dependencies more
efficiently. By enabling this feature, you can segregate events based on their
dependencies and process them independently through a system of database and table event
executors. This method reduces synchronization time for Hive Metastore (HMS) events by
allowing parallel processing while maintaining linearizability for specific tables.
For more information, see Hierarchical metastore event processing
- Impala AES encryption and decryption support
- Impala now supports AES (Advanced Encryption Standard) encryption and decryption to
work better with other systems. AES-GCM is the default mode for strong security, but you
can also use other modes like CTR, CFB, and ECB for different needs. This feature works
with both 128-bit and 256-bit keys and includes checks to keep your data safe and
confidential.
For more information, see AES encryption and decryption support
Apache Jira: IMPALA-13039
- Ubuntu 24.04 support
- You can now build and run Impala on Ubuntu 24.04.
- Dual-stack networking support
- You can now configure Impala to support dual-stack networking, allowing the service to
handle both IPv4 and IPv6 traffic simultaneously. This update includes new configuration
properties in Cloudera Manager and support for dual-stack load
balancing with HAProxy.
For more information, see Impala dual stack IPv6 and Impala dual stack HA Proxy
- Default support for Java 17
- Cloudera now provides Impala using Java 17 for builds and runtime environments.
- OpenTelemetry integration for Impala
- Cloudera now provides OpenTelemetry (OTel) support to
help you see query performance and troubleshoot issues. This new feature, collects and
exports query telemetry data as OpenTelemetry traces to a central OpenTelemetry
compatible collector. The integration is designed to have a minimal impact on
performance because it uses data already being collected and handles the export in a
separate process. For more information, see OpenTelemetry
support for Impala
Apache Jira: IMPALA-13234
- Caching intermediate query results
- Cloudera now supports caching intermediate results to improve query performance and resource efficiency for repetitive workloads. By storing results at various locations within the SQL plan tree, the system can reuse computation for similar queries even when they are not identical, provided the underlying data and settings remain unchanged. For more information, see Caching intermediate results
- Impala-shell and Impyla now supports Python 3.12
- Cloudera now provides support for Python 3.12 in
impala-shell and Impyla.
Apache Jira: IMPALA-14452
- Query cancellation supported during analysis and planning
- This new feature allows you to cancel Impala queries even while they are in the
Frontend stage, which includes analysis and planning. Previously, you could not cancel a
query while it was waiting for operations like loading metadata from the Catalog Server.
With this update, Impala now registers the planning process and can interrupt it to
cancel the query.
Apache Jira: IMPALA-915
- Improved memory estimation and control for large queries
- Impala now uses a more realistic approach to memory estimation for large operations
like
SORT,AGGREGATION, andHASH JOIN. - Expose query cancellation status to UDF interface
- Impala now exposes the query cancellation status to the User-Defined Function
(
UDF) interface. This new feature allows complex or time-consuming UDFs to periodically check if the query has been cancelled by the user. If cancellation is detected, the UDF can stop its work and fail fast. - Expanded compression levels for ZSTD, and ZLIB
- Impala has extended the configurable range of compression levels for ZSTD, and ZLIB (GZIP/DEFLATE) codecs. This enhancement allows for better optimization of the trade-off between compression ratio and write throughput.
- Constant folding is now supported for non-ASCII and binary strings
- Previously, the query planner could not apply the optimization known as constant folding if the resulting value contained non-ASCII characters or was a non-UTF8 binary string. This failure meant that important query filters could not be simplified, which prevented key performance optimizations like predicate pushdown to the storage engine (e.g., Iceberg or Parquet stat filtering).
- Catalogd and Event Processor Improvements
-
- Faster Inserts for Partitioned Tables (IMPALA-14051): Inserting data into very large partitioned tables is now much faster. Previously, Impala communicated with the Hive Metastore (HMS) one partition at a time, which was a major slowdown. Impala now uses the batch insert API to send all insert information to the HMS in one highly efficient call, significantly boosting the performance of your INSERT statements into transactional tables.
- Quicker Table Administration (IMPALA-13599): Administrative tasks, such as running
DROP STATS or changing the
CACHEDstatus of a table, are now much faster on tables with many partitions. Impala previously made thousands of individual calls to the HMS for these operations. The system now batches these updates, making far fewer calls to the HMS and speeding up these essential administrative commands. - Reliable Table Renames (IMPALA-13989): The ALTER TABLE RENAME command no longer fails when an INVALIDATE METADATA command runs at the same time. Previously, this caused the rename to succeed in the Hive Metastore but fail in Impala's Catalog Server. Impala now includes automatic error handling that instantly runs an internal metadata refresh if the rename is interrupted, ensuring the rename completes successfully without requiring any manual user steps.
- Efficient Partition Refreshes (IMPALA-13453): Running REFRESH <table> PARTITION <partition> is now much more efficient. Previously, this command always fully reloaded the partition's metadata and column statistics, even if the partition was unchanged. Impala now checks if the partition data has changed before reloading, avoiding the unnecessary drop-add sequence and significantly improving the efficiency of partition metadata updates.
- Reduced Partition API Calls (IMPALA-13599): Impala has reduced unnecessary API
interactions with the HMS during table-level operations. Commands like
ALTER TABLE... SET CACHED/UNCACHED or DROP
STATS on large tables previously generated thousands of single
alter_partition()calls. Impala now utilizes the HMS's bulk-update functionality, batching these partition updates to drastically reduce the total number of required API calls. - REFRESH on multiple partitions (IMPALA-14089): Impala now supports using the REFRESH statement on multiple partitions within a single command, which significantly speeds up metadata updates by processing partitions in parallel, reduces lock contention in the Catalog service, and avoids unnecessary increases to the table version. See Impala refresh
- Impala cluster responsiveness during table renames(IMPALA-13631): This ensurs that the critical internal lock is no longer held during long-running external calls initiated by ALTER TABLE RENAME operations. This prevents the entire Impala cluster from being blocked, allowing other queries and catalog operations to proceed without interruption.
Apache Jira: IMPALA-14051, IMPALA-13599, IMPALA-13989, IMPALA-13453,IMPALA-14089 , IMPALA-13631
- Enable global admission controller
- It is now a standalone service that maintains a consistent view of cluster resource usage and can make admission decisions without risking over-admission. For more information, see Impala components
- Auto-optimized parquet collection queries
- Impala now automatically boosts query performance for tables with collection data
types by setting the
parquet_late_materialization_thresholdto1when data can be skipped during filtering. This ensures maximum efficiency by reading only the data needed.For more information, see Impala lazy materialization
Apache Jira: IMPALA-3841
- Impala-shell now shows row count and elapsed time for most statements in HiveServer2 mode
- When running Impala queries, some commands over HiveServer2 protocol (like
REFRESHorINVALIDATE) did not show the Fetched X row(s) in Ys output inImpala-shell, even though Beeswax protocol showed them. - Support for arbitrary encodings in text and sequence files
- Impala now supports reading from and writing to Text and Sequence files that use arbitrary character encodings, such as GBK, beyond the default UTF-8.
- New query options for reliable metadata synchronization
- Impala now offers new query options to give you a reliable way to ensure your queries run with the latest table data after the relative HMS modifications are done.
- Impala now supports Hive’s legacy timestamp conversion to ensure consistent interpretation of historical timestamps
- When reading Parquet or Avro files written by Hive using legacy timestamp conversion, Impala's timezone calculation for UTC timestamps could be incorrect, particularly for historical dates and timezones like Asia/Kuala_Lumpur or Singapore before 1982. This meant the timestamps displayed in Impala were different from those in Hive.
- Supporting one-dimensional arrays in Kudu tables
- Impala now supports one-dimensional arrays in Kudu tables. You can create Kudu tables with array columns and perform selection queries on these complex collection types.
- Impala now supports tables created with the Hive JDBC Storage handler
- Impala now translates and adapts the hive JDBC table properties upon loading, making
them compatible with Impala's internal requirements. Previously, Impala had difficulty
reading tables created using the hive JDBC Storage handler due to differences in how
table properties, such as JDBC driver and DBCP configurations, were defined compared to
Impala-created tables. See, Impala External
JDBC Tables (Preview)
Apache Jira: IMPALA-12992
- New catalogd flag to disable HMS sync by default
- You can now use the disable_hms_sync_by_default
catalogd startup flag to set a global default for the
impala.disableHmsSync property. This feature allows you to skip
event processing for all databases and tables by default while opting in specific
elements as needed.
For more information, see: disable_hms_sync_by_default Options for catalogd Daemon
Apache Jira: IMPALA-14131
- Parallel metadata loading in local catalog mode
- Previously, when a query accessed multiple unloaded tables in local catalog mode, Impala loaded the metadata for those tables one after another. This sequential process caused significant latency and performance regressions compared to the legacy catalog mode.
- Specifying compression levels for LZ4, ZLIB, and ZSTD
- You can now specify compression levels for the LZ4, ZLIB, GZIP, and ZSTD codecs to
achieve higher compression ratios. This includes support for high compression modes in
LZ4 (levels 3–12) and negative compression levels for ZSTD. These levels are supported
by using the compression_codec query option.
For more information, see compression_codec query option
Apache Jira: IMPALA-10630, IMPALA-14082
- Impala now supports ARM architecture
