What's New in Apache Impala
Learn about the new features of Apache Impala in Cloudera Runtime 7.3.1.
Added read support for PageHeaderV2 to the Parquet scanner
This update moves page reading logic to new classes,
ParquetColumnChunkReader
and ParquetPageReader
,
simplifying V2 data page reading and decompression. It enhances code manageability for both
V1 and V2 formats.
Apache Jira: IMPALA-6433
Collections of fixed length types as non-passthrough children of unions
This update enables collections of fixed-length types to be used as non-passthrough
children in UNION ALL
operations. It achieves this by allowing the
materialization of these collections.
Apache Jira: IMPALA-12147
Display query execution progress in Impala Web UI
Adds a query progress indicator to the /queries page in Impala's Web UI, showing the completion status of fragment instances. This feature provides better tracking for computation-intensive queries, supplementing the scan progress bar.
Apache Jira: IMPALA-12048
Allow implicit casts between numeric and string types when inserting into table
The current implementation requires explicit casts for numeric and string-based literals.
This is controlled through a query option allow_unsafe_casts
and turned off
by default. This query option allows implicit casting between some numeric types and string
types. See, implicit casting
Apache Jira: https://issues.apache.org/jira/browse/IMPALA-10173
Optimize query planning by reducing getLocation() and getFileSystem() calls
The fix reduces planning time by calling HdfsPartition.getLocation()
once
per partition and caching the FileSystem object based on the URI scheme and authority. This
minimizes expensive decompression and redundant getFileSystem()
calls,
improving performance for queries with many partitions.
Apache Jira: IMPALA-12408
JSON File Reader Prototype
This prototype enables reading JSON files using the rapidjson library with Arrow support such as HdfsJsonScanner, callback functions, and startup flag.
Apache Jira: IMPALA-10798
CREATE TABLE LIKE for Kudu tables
Impala now supports a dedicated keytab for HTTP SPNEGO authentication, enabling easier
management of Kerberos keytabs. A new --spnego_keytab_file
flag lets you
specify a separate keytab for the web console when
--webserver_require_spnego
is enabled. If this flag is set, the web
server will use the SPNEGO keytab for HTTP authentication, while the main service keytab
remains unchanged. If not specified, the web server defaults to using the primary service
keytab for SPNEGO
Apache Jira: IMPALA-4052
Dedicated SPNEGO keytab for Impala web console authentication
Impala now supports a dedicated keytab for HTTP SPNEGO authentication, enabling easier
management of Kerberos keytabs. A new --spnego_keytab_file
flag lets you
specify a separate keytab for the web console when
--webserver_require_spnego
is enabled. If this flag is set, the web
server will use the SPNEGO keytab for HTTP authentication, while the main service keytab
remains unchanged. If not specified, the web server defaults to using the primary service
keytab for SPNEGO
Apache Jira: IMPALA-12318
Non-unique primary keys in Kudu
Kudu now supports non-unique primary keys by automatically adding an
auto_increment_id
column to form a unique composite primary key. This
column, a system-generated big integer, ensures uniqueness within each tablet server region
and is hidden unless specified in SELECT
statements. ALTER
TABLE
modifications and UPSERT
operations for this column are
currently unsupported.
Apache Jira: IMPALA-11809
Hive's ESRI geospatial functions as built-ins
This change adds Hive's ESRI geospatial functions as built-in UDFs in Impala.
Apache Jira: IMPALA-11745
Unicode column name support in Impala
Impala now supports Unicode characters in column names, aligning with Hive's support for
non-ASCII characters. This enhancement leverages Hive's
validateColumnName()
function, which removes restrictions on column names
at the metadata level. With this update, Impala allows greater flexibility for column naming
while remaining consistent with Hive's metadata validation standards.
Apache Jira: IMPALA-12465
Support custom hash partitions at range level in Kudu tables
Impala now supports specifying custom hash partitions at the range level in Kudu tables.
You can define hash schemas within specific partitions using the updated CREATE
TABLE
and ALTER TABLE
syntax, and view them with the new
SHOW HASH SCHEMA
statement. This update aligns hash partitioning more
closely with range partitioning, enhancing flexibility while maintaining backward
compatibility.
Apache Jira: IMPALA-11430