What's New in Apache Impala
Learn about the new features of Impala in Cloudera Runtime 7.2.8.
New Query Options
Added a new query option
REFRESH_UPDATED_HMS_PARTITIONS to refresh any updated HMS
partitions. This option is disabled by default so that the performance is not compromised
when refreshing a table. However, you can enable it for certain corner case scenarios when
the refresh table command does not detect changed partitions.
Added a new query option
USE_DOP_FOR_COSTING for the planner to consider partition
distribution when needed. When this query option is
enabled, the planner incorporates the join operator's degree of parallelism (dop) and
broadcast-to-partition factor in the costing of the build side of a join when comparing
broadcast vs partition distribution thereby increasing the cost of the broadcast join's
Added a new planner query option
OPTIMIZE_SIMPLE_LIMIT to optimize the planning time for
simple limit queries by only considering a minimal set of partitions. This query option also
applies to subqueries and view.
Added this advanced query option
JOIN_ROWS_PRODUCED_LIMIT to prevent
runaway join queries by limiting the number of join rows produced by a join node.
See Query options for more information.
Missing Overloads of Mask Functions used in Ranger Default Masking Policies
The mask functions in Hive are implemented through GenericUDFs which can accept an infinite number of function signatures. Impala currently does not support GenericUDFs. However, Impala has built in mask functions that are implemented through overloads
This release adds some missing overloads that could be used by Ranger default masking policies, e.g. MASK_HASH, MASK_SHOW_LAST_4, MASK_DATE_SHOW_YEAR, etc.
See Limitations on Mask Functions for more information.
Access via Roles in CDP Impala
Impala with Sentry revolved around granting privileges to ROLES, and ROLES to GROUPS. Until this release, Impala’s integration with Ranger did not support ROLE related DDL statements. So as a workaround you had to migrate the ROLE-based authorization policies, manage them using Ranger's web UI, to handle them correctly in Impala. You will no longer need to use the Ranger’s web UI to manage the ROLEs in CDP 7.2.8 since Impala now supports ROLE management through ROLE related statements.
See ROLE statements in Impala integrated with Ranger for more information.
New Impala Shell Configuration Option
Added a new option
profile_format that can be run in Impala shell to control the
formatting of the output. You can either specify text | json | prettyjson.
See Impala Shell Configuration Options for more information.
New Table Level Hint
Added a new table level hint,
convert_limit_to_sample that can be attached to a table
either in the main query block or within a view/subquery and when the simple limit
optimization conditions are satisfied the limit is converted to a table sample.
See Optimizer hints in Impala for more information.
Impala now supports decoding
RLE_DICTIONARY encoded pages. This encoding is
identical to the already-supported
PLAIN_DICTIONARY encoding but the PLAIN enum value is
used for the dictionary pages and the
RLE_DICTIONARY enum value is used for the data pages.
When creating files outside of Impala for use by Impala, make sure to use one of the
See Using Parquet Data Files for more information.
New Startup Flag
Introduced a new startup flag
--ping_expose_webserver_url (true by default) to control
whether or not PingImpalaService, PingImpalaHS2Service RPC calls should expose the debug web
url to the client.
See Configuring Client Access to Impala for more information.
Cookie Authentication Support to impala-shell
Modified HTTP HS2 server to accept cookies for authentication to avoid having to
authenticate every request through LDAP or Kerberos. This new support is associated with the
--max_cookie_lifetime_s, that determines the period the generated cookies should be
valid. Setting the flag to 0 disables cookie support.
See Configuring Client Access to Impala for more information on
New Feature Flag for Incremental Metadata Update
Added a feature flag
enable_incremental_metadata_updates to control how
catalogd should propagate metadata updates to the catalog topic. By default this is ON.
enable_incremental_metadata_updates is true, catalogd will send
metadata updates in partition granularity in both full and minimal topic mode. So a table
that just has one partition changed will only have an update on that partition. This reduces
the size of the metadata that needs to be sent from the catalogd.
enable_incremental_metadata_updates is false, catalogd will send
metadata updates in table granularity. So a table that just has one partition changed will
still have an update for the whole table object. This has been the legacy behavior. In this
case, catalogd still sends the whole table thrift object to the catalog topic.
A query with analytical functions will no longer materialize the predicates pushed down to Kudu. This optimization will consequently reduce the amount of data to exchange and sort.