What's New in Apache Impala

Learn about the new features of Impala in Cloudera Runtime 7.2.16.

UTF-8 mode support

Some Impala STRING types now support UTF-8 aware behavior to ensure consistent results for non-ASCII characters in the string in both Hive and Impala.

Asynchronous model for some DDL statements

This release adds a new query option ENABLE_ASYNC_DDL_EXECUTION that you can configure to execute any request from an Impala client to the Impala server asynchronously in different threads without blocking the RPC. Using this asynchronous model, you can get a query handle and poll for state and results to avoid Impala clients hanging indefinitely.

Impala-shell with Python 3

Since Python 2.7 has reached the end of life, impala-shell can now be used with Python 3 by installing the latest release from PyPI at https://pypi.org/project/impala-shell/4.2.0a1/ .

BYTES function support

Impala now supports the BYTES() function. This function returns the number of bytes contained in a byte string.

Consolidating the ranger audit logs for the same table

Impala now consolidates the Ranger audit log entries of column accesses granted by the same policy for columns in the same table, after all the requests for accessing an object are processed.

Resolving ORC columns by names

Before this release, Impala resolved ORC columns by index. In this release, a query option ORC_SCHEMA_RESOLUTION is added to support resolving ORC columns by names.

Retrieving the data file name

Impala now supports including a virtual column in a standard SELECT statement select INPUT__FILE__NAME from <tablename> to retrieve the name of the data file that stores the actual row in a table.

Zipping unnest on arrays from Views

As part of this release, you can use zipping unnest functionality on arrays from Views. Before this release, this zipping functionality worked for arrays only in Tables but did not support Views as a source. For more information about using this zipping unnest functionality, see Zipping unnest on arrays from Views.

Min/Max filtering in Impala

Using Parquet format, you can query to find the minimum or maximum value for a column within a partition, row group, page, or row.

Reading and writing Parquet bloom filters

Bloom filter is a performance optimization feature now available in Impala. This filter tells you, rapidly and memory-efficiently, whether the data you are looking for is present in a file.

Added support for thrift-0.16.0