What's New in Apache Impala
Learn about the new features of Impala in Cloudera Runtime 7.2.16.
UTF-8 mode support
Some Impala STRING types now support UTF-8 aware behavior to ensure consistent results for non-ASCII characters in the string in both Hive and Impala.
Asynchronous model for some DDL statements
This release adds a new query option ENABLE_ASYNC_DDL_EXECUTION that you can configure to execute any request from an Impala client to the Impala server asynchronously in different threads without blocking the RPC. Using this asynchronous model, you can get a query handle and poll for state and results to avoid Impala clients hanging indefinitely.
Impala-shell with Python 3
Since Python 2.7 has reached the end of life, impala-shell can now be used with Python 3 by installing the latest release from PyPI at https://pypi.org/project/impala-shell/4.2.0a1/ .
BYTES function support
Impala now supports the BYTES() function. This function returns the number of bytes contained in a byte string.
Consolidating the ranger audit logs for the same table
Impala now consolidates the Ranger audit log entries of column accesses granted by the same policy for columns in the same table, after all the requests for accessing an object are processed.
Resolving ORC columns by names
Before this release, Impala resolved ORC columns by index. In this release, a query option ORC_SCHEMA_RESOLUTION is added to support resolving ORC columns by names.
Retrieving the data file name
Impala now supports including a virtual column in a standard SELECT statement select INPUT__FILE__NAME from <tablename> to retrieve the name of the data file that stores the actual row in a table.
Zipping unnest on arrays from Views
As part of this release, you can use zipping unnest functionality on arrays from Views. Before this release, this zipping functionality worked for arrays only in Tables but did not support Views as a source. For more information about using this zipping unnest functionality, see Zipping unnest on arrays from Views.
Min/Max filtering in Impala
Using Parquet format, you can query to find the minimum or maximum value for a column within a partition, row group, page, or row.
Reading and writing Parquet bloom filters
Bloom filter is a performance optimization feature now available in Impala. This filter tells you, rapidly and memory-efficiently, whether the data you are looking for is present in a file.