What's New in Apache Impala

New Query Options🔗

REFRESH_UPDATED_HMS_PARTITIONS

Added a new query option REFRESH_UPDATED_HMS_PARTITIONS to refresh any updated HMS partitions. This option is disabled by default so that the performance is not compromised when refreshing a table. However, you can enable it for certain corner case scenarios when the refresh table command does not detect changed partitions.

USE_DOP_FOR_COSTING

Added a new query option USE_DOP_FOR_COSTING for the planner to consider partition distribution when needed. When this query option is enabled, the planner incorporates the join operator's degree of parallelism (dop) and broadcast-to-partition factor in the costing of the build side of a join when comparing broadcast vs partition distribution thereby increasing the cost of the broadcast join's build side.

OPTIMIZE_SIMPLE_LIMIT

Added a new planner query option OPTIMIZE_SIMPLE_LIMIT to optimize the planning time for simple limit queries by only considering a minimal set of partitions. This query option also applies to subqueries and view.

JOIN_ROWS_PRODUCED_LIMIT

Added this advanced query option JOIN_ROWS_PRODUCED_LIMIT to prevent runaway join queries by limiting the number of join rows produced by a join node.

See Query options for more information.

Missing Overloads of Mask Functions used in Ranger Default Masking Policies🔗

The mask functions in Hive are implemented through GenericUDFs which can accept an infinite number of function signatures. Impala currently does not support GenericUDFs. However, Impala has built in mask functions that are implemented through overloads

This release adds some missing overloads that could be used by Ranger default masking policies, e.g. MASK_HASH, MASK_SHOW_LAST_4, MASK_DATE_SHOW_YEAR, etc.

See Limitations on Mask Functions for more information.

Access via Roles in CDP Impala🔗

Impala with Sentry revolved around granting privileges to ROLES, and ROLES to GROUPS. Until this release, Impala’s integration with Ranger did not support ROLE related DDL statements. So as a workaround you had to migrate the ROLE-based authorization policies, manage them using Ranger's web UI, to handle them correctly in Impala. You will no longer need to use the Ranger’s web UI to manage the ROLEs in CDP 7.2.8 since Impala now supports ROLE management through ROLE related statements.

See ROLE statements in Impala integrated with Ranger for more information.

New Impala Shell Configuration Option🔗

Added a new option profile_format that can be run in Impala shell to control the formatting of the output. You can either specify text | json | prettyjson.

See Impala Shell Configuration Options for more information.

New Table Level Hint🔗

Added a new table level hint, convert_limit_to_sample that can be attached to a table either in the main query block or within a view/subquery and when the simple limit optimization conditions are satisfied the limit is converted to a table sample.

See Optimizer hints in Impala for more information.

RLE_DICTIONARY Support🔗

Impala now supports decodingRLE_DICTIONARY encoded pages. This encoding is identical to the already-supported PLAIN_DICTIONARY encoding but the PLAIN enum value is used for the dictionary pages and the RLE_DICTIONARY enum value is used for the data pages. When creating files outside of Impala for use by Impala, make sure to use one of the supported encodings.

See Using Parquet Data Files for more information.

New Startup Flag🔗

Introduced a new startup flag --ping_expose_webserver_url (true by default) to control whether or not PingImpalaService, PingImpalaHS2Service RPC calls should expose the debug web url to the client.

See Configuring Client Access to Impala for more information.

Cookie Authentication Support to impala-shell🔗

Modified HTTP HS2 server to accept cookies for authentication to avoid having to authenticate every request through LDAP or Kerberos. This new support is associated with the flag, --max_cookie_lifetime_s, that determines the period the generated cookies should be valid. Setting the flag to 0 disables cookie support.

See Configuring Client Access to Impala for more information on --max_cookie_lifetime_s.

New Feature Flag for Incremental Metadata Update🔗

Added a feature flag enable_incremental_metadata_updates to control how catalogd should propagate metadata updates to the catalog topic. By default this is ON.

If enable_incremental_metadata_updates is true, catalogd will send metadata updates in partition granularity in both full and minimal topic mode. So a table that just has one partition changed will only have an update on that partition. This reduces the size of the metadata that needs to be sent from the catalogd.

If enable_incremental_metadata_updates is false, catalogd will send metadata updates in table granularity. So a table that just has one partition changed will still have an update for the whole table object. This has been the legacy behavior. In this case, catalogd still sends the whole table thrift object to the catalog topic.

Feature Enhancement🔗

A query with analytical functions will no longer materialize the predicates pushed down to Kudu. This optimization will consequently reduce the amount of data to exchange and sort.