What's New in Apache Impala

Added read support for PageHeaderV2 to the Parquet scanner

This update moves page reading logic to new classes, ParquetColumnChunkReader and ParquetPageReader, simplifying V2 data page reading and decompression. It enhances code manageability for both V1 and V2 formats.

Apache Jira: IMPALA-6433

Collections of fixed length types as non-passthrough children of unions

This update enables collections of fixed-length types to be used as non-passthrough children in UNION ALL operations. It achieves this by allowing the materialization of these collections.

Apache Jira: IMPALA-12147

Display query execution progress in Impala Web UI

Adds a query progress indicator to the /queries page in Impala's Web UI, showing the completion status of fragment instances. This feature provides better tracking for computation-intensive queries, supplementing the scan progress bar.

Apache Jira: IMPALA-12048

Allow implicit casts between numeric and string types when inserting into table

The current implementation requires explicit casts for numeric and string-based literals. This is controlled through a query option allow_unsafe_casts and turned off by default. This query option allows implicit casting between some numeric types and string types. See, implicit casting

Apache Jira: https://issues.apache.org/jira/browse/IMPALA-10173

Optimize query planning by reducing getLocation() and getFileSystem() calls

The fix reduces planning time by calling HdfsPartition.getLocation() once per partition and caching the FileSystem object based on the URI scheme and authority. This minimizes expensive decompression and redundant getFileSystem() calls, improving performance for queries with many partitions.

Apache Jira: IMPALA-12408

JSON File Reader Prototype

This prototype enables reading JSON files using the rapidjson library with Arrow support such as HdfsJsonScanner, callback functions, and startup flag.

Apache Jira: IMPALA-10798

CREATE TABLE LIKE for Kudu tables

Impala now supports a dedicated keytab for HTTP SPNEGO authentication, enabling easier management of Kerberos keytabs. A new --spnego_keytab_file flag lets you specify a separate keytab for the web console when --webserver_require_spnego is enabled. If this flag is set, the web server will use the SPNEGO keytab for HTTP authentication, while the main service keytab remains unchanged. If not specified, the web server defaults to using the primary service keytab for SPNEGO

Apache Jira: IMPALA-4052

Dedicated SPNEGO keytab for Impala web console authentication

Impala now supports a dedicated keytab for HTTP SPNEGO authentication, enabling easier management of Kerberos keytabs. A new --spnego_keytab_file flag lets you specify a separate keytab for the web console when --webserver_require_spnego is enabled. If this flag is set, the web server will use the SPNEGO keytab for HTTP authentication, while the main service keytab remains unchanged. If not specified, the web server defaults to using the primary service keytab for SPNEGO

Apache Jira: IMPALA-12318

Non-unique primary keys in Kudu

Kudu now supports non-unique primary keys by automatically adding an auto_increment_id column to form a unique composite primary key. This column, a system-generated big integer, ensures uniqueness within each tablet server region and is hidden unless specified in SELECT statements. ALTER TABLE modifications and UPSERT operations for this column are currently unsupported.

Apache Jira: IMPALA-11809

Hive's ESRI geospatial functions as built-ins

This change adds Hive's ESRI geospatial functions as built-in UDFs in Impala.

Apache Jira: IMPALA-11745

Unicode column name support in Impala

Impala now supports Unicode characters in column names, aligning with Hive's support for non-ASCII characters. This enhancement leverages Hive's validateColumnName() function, which removes restrictions on column names at the metadata level. With this update, Impala allows greater flexibility for column naming while remaining consistent with Hive's metadata validation standards.

Apache Jira: IMPALA-12465

Support custom hash partitions at range level in Kudu tables

Impala now supports specifying custom hash partitions at the range level in Kudu tables. You can define hash schemas within specific partitions using the updated CREATE TABLE and ALTER TABLE syntax, and view them with the new SHOW HASH SCHEMA statement. This update aligns hash partitioning more closely with range partitioning, enhancing flexibility while maintaining backward compatibility.

Apache Jira: IMPALA-11430