What's New in Apache Impala

Learn about the new features of Impala in Cloudera Runtime 7.2.6.

Transparent retry in Impala

If a SELECT query fails due to a cluster membership change, the Coordinator will cancel and unregister the running query and then launch a retry of the query. For e.g., if one of the executor nodes fails during query execution, the query fails but is transparently re-executed, either with the executor node immediately replaced, or with a temporarily reduced number of executor nodes. This feature will support retrying the entire query and NOT the individual query fragments. INSERT and DDL queries will NOT be retried.

This is done through the query option “RETRY_FAILED_QUERIES”. From this release, this option is enabled by default.

Note that query retry will be skipped if the query has returned any results to the client. To avoid this, enable Result Spooling and spool_all_results_for_retries.

See Query Options for more information.

Datasketches in Impala

This release introduces two Datasketch algorithms, HLL and KLL:

  • Datasketch algorithm HLL for queries that take too long to calculate exact results due to very large data sets (e.g. number of distinct values).
  • Streaming algorithm (KLL) that uses the percentile/quantile functions to statistically analyze the approximate distribution of comparable data from a very large stream.

See Using HLL Datasketch Algorithms in Impala and Using KLL Datasketch Algorithms in Impala for more information.

Hive UDF support in Impala

Changes in Hive UDF implemented through "GenericUDF" supports a lot more features. Even though Impala users can call Hive UDFs, Impala does not yet support new Hive UDFs based on the GenericUDF class.

For more information on Using Hive UDFs with Impala, see Using Hive UDFs with Impala

ROLE-related statements in Impala

Impala doesn't currently support ROLE-related DDL statements for Ranger. However if you are migrating your workload from CDH to CDP or upgrading from CDH to CDP, you can migrate the role-based authentication rules and manage them using the Ranger admin UI.

Blacklisting Faulty Executors

An Impala cluster typically consists of one or more executors, one or more coordinators, an instance of Catalog and an instance of Statestore. Each of these components can unexpectedly hang or exit due to various reasons. Until this release, if an executor fails to run a query an error status will be issued to the client issuing the query. This renders the entire cluster unusable because queries will continuously fail whenever impala tries to run any query on the faulty node. To avoid any query failures due to the transient unavailability of executors a new feature to "Blacklist faulty executors" has been added for Private Cloud Base and DataHub Cloud users. It handles the scenario where an executor still responds to heartbeats from the statestore but fails to successfully run queries by temporarily blacklisting any executors failing a query. This triggers the coordinator to reschedule a safe-to-retry query on the list of healthy executors rather than rendering the entire cluster unusable. The new feature helps Impala recover from transient issues with the executors.