What's New in Apache Impala

This topic lists new features for Apache Impala in this release of Cloudera Runtime.

Optimized performance for multi-threaded query execution

Multi-threaded query execution can be enabled manually on a per-query basis using the mt_dop query option for all SELECT queries. Previously queries with joins were not supported.

For details, see MT_DOP query option.

Improved read performance for ORC tables with nested type columns

Deprecated support to LZO

In CDP, we deprecated support to LZO in Impala. So the impala-lzo plugin is not shipped as part of GPL Extras parcel.

Improved performace for the automatic updates of metadata

If Impala inserts into a table it refreshes the underlying table/partition. When the configuration enable_insert_events is set to True Impala will generate INSERT event types which when received by other Impala clusters will automatically refresh the tables or partitions. Event processing must be ON, for this property to work.

Started generating Ranger audit logs when column masking policy is applied in a policy

Reduced the Impala runtime image size and used UBI base image

Increased scratch capacity

To help reduce spilling to disk:

  • Added startup parameter to support Spill-to-disk compression to increase effective scratch capacity by 2.5x.
  • Added startup parameter to reclaim space in scratch files.

Improved data cache performance

  • Improved the efficiency of the data cache by providing an option to use a different cache eviction algorithm (LIRS).

Support for Kudu Date and Varchar column types

Support for reading ZSTD-compressed text files

For details, see Using Text Data Files.

Improved read performance of ORC tables

Improved Impala resiliency

This release adds client retry support in the impala-shell. For details about installing the impala-shell, see Using Impala shell.

broadcast_bytes_limit query option

In this release, you can set a limit for the size of a broadcast input. For details, see Impala Query Options.

ORC stability and performance improvements

ORC reads enabled by default

Impala stability and performance have been improved. Consequently, ORC reads are now enabled in Impala by default. To disable, set -\-enable_orc_scanner to false when starting the cluster.

Constraints

This release adds support for primary and foreign key constraints, but in this release the constraints are advisory and intended for estimating cardinality during query planning in a future release. There is no attempt to enforce constraints. For details, see the “Constraints” section of Create Table Statement.

Enhanced external Kudu table

By default HMS implicitly translates internal Kudu tables to external Kudu tables with the 'external.table.purge' property set to true. These tables behave similar to internal tables. You can explicitly create such external Kudu tables. For details, see the “External Kudu Tables” section of Create Table Statement.

Ranger column masking

This release supports Ranger column masking, which hides sensitive columnar data in Impala query output. For example, you can define a policy that reveals only the first or last four characters of column data. Column masking is enabled by default. For details, see the "Ranger Column Masking" section in Impala Authorization.