Query options

Specify query options in the SET statement to apply the settings to the subsequently issued queries.

Some query options are useful in day-to-day operations for improving usability, performance, or flexibility.

Other query options control special-purpose aspects of Impala operation and are intended primarily for advanced debugging or troubleshooting.

Options with Boolean parameters can be set to 1 or true to enable, or 0 or false to turn off.

You can set query options directly through the JDBC and ODBC interfaces by using the SET statement. Formerly, SET was only available as a command within the impala-shell interpreter.

You can set query options for an impala-shell session by specifying one or more command-line arguments of the form --query_option=option=value.

Impala supports the following query options:

ABORT_ON_ERROR
ALLOW_ERASURE_CODED_FILES
APPX_COUNT_DISTINCT
BATCH_SIZE
BROADCAST_BYTES_LIMIT
BUFFER_POOL_LIMIT
COMPRESSION_CODEC
COMPUTE_STATS_MIN_SAMPLE_SIZE
CONVERT_LEGACY_HIVE_PARQUET_UTC_TIMESTAMPS
DEBUG_ACTION
DECIMAL_V2
DEFAULT_FILE_FORMAT
DEFAULT_HINTS_INSERT_STATEMENT
DEFAULT_JOIN_DISTRIBUTION_MODE
DEFAULT_NDV_SCALE
DEFAULT_SPILLABLE_BUFFER_SIZE
DEFAULT_TRANSACTIONAL_TYPE
DELETE_STATS_IN_TRUNCATE
DISABLE_CODEGEN
DISABLE_CODEGEN_ROWS_THRESHOLD
DISABLE_HBASE_NUM_ROWS_ESTIMATE
DISABLE_ROW_RUNTIME_FILTERING
DISABLE_STREAMING_PREAGGREGATIONS
DISABLE_UNSAFE_SPILLS
ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION
ENABLE_EXPR_REWRITES
EXEC_SINGLE_NODE_ROWS_THRESHOLD
EXEC_TIME_LIMIT_S
EXPAND_COMPLEX_TYPES
EXPLAIN_LEVEL
FETCH_ROWS_TIMEOUT_MS
HBASE_CACHE_BLOCKS
HBASE_CACHING
IDLE_SESSION_TIMEOUT
JOIN_ROWS_PRODUCED_LIMIT
KUDU_READ_MODE
KUDU_REPLICA_SELECTION
LIVE_PROGRESS
LIVE_SUMMARY
MAX_CNF_EXPRS
MAX_ERRORS
MAX_MEM_ESTIMATE_FOR_ADMISSION
MAX_NUM_RUNTIME_FILTERS
MAX_RESULT_SPOOLING_MEM
MAX_ROW_SIZE
MAX_SCAN_RANGE_LENGTH
MAX_SPILLED_RESULT_SPOOLING_MEM
MEM_LIMIT
MIN_SPILLABLE_BUFFER_SIZE
MT_DOP
NUM_NODES
NUM_ROWS_PRODUCED_LIMIT
NUM_SCANNER_THREADS
OPTIMIZE_SIMPLE_LIMIT
OPTIMIZE_PARTITION_KEY_SCANS
ORC_SCHEMA_RESOLUTION
PARQUET_COMPRESSION_CODEC
PARQUET_ANNOTATE_STRINGS_UTF8
PARQUET_ARRAY_RESOLUTION
PARQUET_DICTIONARY_FILTERING
PARQUET_FALLBACK_SCHEMA_RESOLUTION
PARQUET_FILE_SIZE
PARQUET_OBJECT_STORE_SPLIT_SIZE
PARQUET_PAGE_ROW_COUNT_LIMIT
PARQUET_READ_PAGE_INDEX
PARQUET_READ_STATISTICS
PARQUET_WRITE_PAGE_INDEX
PREFETCH_MODE
QUERY_TIMEOUT_S
REFRESH_UPDATED_HMS_PARTITIONS
REPLICA_PREFERENCE
REQUEST_POOL
RESOURCE_TRACE_RATIO
RETRY_FAILED_QUERIES
RUNTIME_BLOOM_FILTER_SIZE
RUNTIME_FILTER_MAX_SIZE
RUNTIME_FILTER_MIN_SIZE
RUNTIME_FILTER_MODE
RUNTIME_FILTER_WAIT_TIME_MS
S3_SKIP_INSERT_STAGING
SCAN_BYTES_LIMIT
SCHEDULE_RANDOM_REPLICA
SCRATCH_LIMIT
SHUFFLE_DISTINCT_EXPRS
SPOOL_QUERY_RESULTS
SPOOL_ALL_RESULTS_FOR_RETRIES
SUPPORT_START_OVER
SYNC_DDL
THREAD_RESERVATION_AGGREGATE_LIMIT
THREAD_RESERVATION_LIMIT
TIMEZONE
TOPN_BYTES_LIMIT
USE_DOP_FOR_COSTING
USE_LOCAL_TZ_FOR_UNIX_TIMESTAMP_CONVERSIONS

ABORT_ON_ERROR query option

When this option is enabled, Impala cancels a query immediately when any of the nodes encounters an error, rather than continuing and possibly returning incomplete results. This option is disabled by default, to help gather maximum diagnostic information when an error occurs, for example, whether the same problem occurred on all nodes or only a single node. Currently, the errors that Impala can skip over involve data corruption, such as a column that contains a string value when expected to contain an integer value.

To control how much logging Impala does for non-fatal errors when ABORT_ON_ERROR is turned off, use the MAX_ERRORS option.

Type: BOOLEAN

Default: FALSE (0)

ALLOW_ERASURE_CODED_FILES query option

Use the ALLOW_ERASURE_CODED_FILES query option to enable or disable the support of erasure coded files in Impala. Until Impala is fully tested and certified with erasure coded files, this query option is set to FALSE by default.

When the ALLOW_ERASURE_CODED_FILES query option is set to FALSE, Impala returns an error when a query requires scanning an erasure coded file.

Type: BOOLEAN

Default: FALSE (0)

Added in: Impala 3.1

APPX_COUNT_DISTINCT query option

When the APPX_COUNT_DISTINCT query option is set to TRUE, Impala implicitly converts COUNT(DISTINCT) operations to the NDV() function calls. The resulting count is approximate rather than precise. Enable the query option when a tolerable amount of error is acceptable in order to obtain faster query results than with a COUNT (DISTINCT) queries.

Type: BOOLEAN

Default: FALSE (0)

BATCH_SIZE query option

Number of rows evaluated at a time by SQL operators. Unspecified or a size of 0 uses a predefined default size. Using a large number improves responsiveness, especially for scan operations, at the cost of a higher memory footprint.

This option is primarily for testing during Impala development, or for use under the direction of Cloudera support.

Type: Number

Default: 0 (meaning the predefined default of 1024)

BROADCAST_BYTES_LIMIT query option

Sets the limit for the size of the broadcast input based on estimated size. The Impala planner may in rare cases make a bad choice to broadcast a large table or intermediate result and encounter performance problems due to high memory pressure. Setting this limit will make the planner pick a partition based hash join instead of broadcast and avoid such performance problems.

Type: INT

Default: The default value is 34359738368 (32 GB). A value of 0 causes the option to be ignored.

Usage notes:

Set this option in bytes. For example:
-- Change the limit to 16GB.
set broadcast_bytes_limit=17179869184;

-- Disable the BROADCAST_BYTES_LIMIT.
set broadcast_bytes_limit=0;

BUFFER_POOL_LIMIT query option

Defines a limit on the amount of memory that a query can allocate from the internal buffer pool. The value for this limit applies to the memory on each host, not the aggregate memory across the cluster.

Typically not changed by users, except during diagnosis of out-of-memory errors during queries.

You can set it to an absolute value, e.g. 8GB, or a relative value, e.g. 80%, based on the MEM_LIMIT setting.

Type: INT

Default: The default setting for this option is the lower of 80% of the MEM_LIMIT setting, or the MEM_LIMIT setting minus 100 MB.

Usage notes:

If queries encounter out-of-memory errors, consider decreasing the BUFFER_POOL_LIMIT setting to less than 80% of the MEM_LIMIT setting.

COMPRESSION_CODEC query option

When Impala writes Parquet data files and other supported data files using the INSERT statement, the underlying compression is controlled by the COMPRESSION_CODEC query option.

For the ZSTD compression, the optional compression level can be specified as shown in the syntax below.

Syntax:

SET COMPRESSION_CODEC=codec_name; // Supported for all codecs.
SET COMPRESSION_CODEC=codec_name:compression_level; // Only supported for ZSTD.

The allowed values for this query option are SNAPPY (the default), GZIP, ZSTD, LZ4, and NONE.

ZSTD also supports setting a compression level. The lower the level, the faster the speed at the cost of compression ratio. Compression levels from 1 up to 22 are supported for ZSTD. The default compression level 3 is used, if one is not passed using the compression_codec query option.

The option value is not case-sensitive.

If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option setting, not just queries involving Parquet tables. (The value BZIP2 is also recognized, but is not compatible with Parquet tables.)

Type: STRING

Default: SNAPPY

COMPUTE_STATS_MIN_SAMPLE_SIZE query option

The COMPUTE_STATS_MIN_SAMPLE_SIZE query option specifies the minimum number of bytes that will be scanned in COMPUTE STATS TABLESAMPLE, regardless of the user-supplied sampling percent. This query option prevents sampling for very small tables where accurate stats can be obtained cheaply without sampling because the minimum sample size is required to get meaningful stats.

Type: INT

Default: 1GB

CONVERT_LEGACY_HIVE_PARQUET_UTC_TIMESTAMPS query option

When Hive writes to Parquet data files, the TIMESTAMP values are normalized to UTC from the local time zone of the host where the data was written. On the other hand, Impala does not make any time zone adjustment when it writes or reads INT96 TIMESTAMP values to Parquet files. This difference in time zone handling can cause potentially inconsistent results when Impala processes TIMESTAMP values in the Parquet files written by Hive.

To avoid incompatibility problems or having to code workarounds, you can specify this CONVERT_LEGACY_HIVE_PARQUET_UTC_TIMESTAMPS query option.

Type: BOOLEAN

DEBUG_ACTION query option

Introduces artificial problem conditions within queries. For internal Cloudera debugging and troubleshooting.

Type: STRING

Default: An empty string

DECIMAL_V2 query option

A query option that changes behavior related to the DECIMAL data type. Set this option to FALSE for backward compatibility to Impala 2.x.

Type: Boolean

Default: TRUE

DEFAULT_FILE_FORMAT query option

Use the DEFAULT_FILE_FORMAT query option to set the default table file format. The following values are supported:

TEXT (0)
RC_FILE (1)
SEQUENCE_FILE (2)
AVRO (3)
PARQUET (4)
KUDU (5)
ORC (6)

In impala-shell or Hue, the SET DEFAULT_FILE_FORMAT statement will not return an error when the option was set to an unsupported value. Impala validates the value and returns an error when you submitted the next query.

Type: Enum as specified above

Default: TEXT (0)

Added in: Impala 3.3

DEFAULT_HINTS_INSERT_STATEMENT query option

The DEFAULT_HINTS_INSERT_STATEMENT query option sets the default hints for the INSERT statements with no optimizer hint specified.

When there are hints specified in an INSERT statement, these default hints are ignored.

You can specify multiple hints separated by ':'.

For example:

SET DEFAULT_HINTS_INSERT_STATEMENT=CLUSTERED;
SET DEFAULT_HINTS_INSERT_STATEMENT=SHUFFLE;
SET DEFAULT_HINTS_INSERT_STATEMENT=NOCLUSTERED:NOSHUFFLE;

The default hints apply to the HDFS and Kudu table formats and are ignored for the HBase table format.

Type: STRING

Default: No default hint

Added in: Impala 3.3

DEFAULT_JOIN_DISTRIBUTION_MODE query option

Impala optimizes join queries based on the presence of table statistics, which are produced by the Impala COMPUTE STATS statement. By default, when a table involved in the join query does not have statistics, Impala uses the broadcast technique that transmits the entire contents of the table to all executor nodes participating in the query. If one table involved in a join has statistics and the other does not, the table without statistics is broadcast. If both tables are missing statistics, the table on the right-hand side of the join is broadcast. This behavior is appropriate when the table involved is relatively small, but can lead to excessive network, memory, and CPU overhead if the table being broadcast is large.

Because Impala queries frequently involve very large tables, and suboptimal joins for such tables could result in spilling or out-of-memory errors, the setting DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE lets you override the default behavior. The shuffle join mechanism divides the corresponding rows of each table involved in a join query using a hashing algorithm, and transmits subsets of the rows to other nodes for processing. Typically, this kind of join is more efficient for joins between large tables of similar size.

The setting DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE is recommended when setting up and deploying new clusters, because it is less likely to result in serious consequences such as spilling or out-of-memory errors if the query plan is based on incomplete information. This setting is not the default, to avoid changing the performance characteristics of join queries for clusters that are already tuned for their existing workloads.

The allowed values are BROADCAST (0) or SHUFFLE (1).

Type: INT

Examples:

The following examples demonstrate appropriate scenarios for each setting of this query option.


-- Create a billion-row table.
create table big_table stored as parquet
  as select * from huge_table limit 1e9;

-- For a big table with no statistics, the
-- shuffle join mechanism is appropriate.
set default_join_distribution_mode=shuffle;

...join queries involving the big table...


-- Create a hundred-row table.
create table tiny_table stored as parquet
  as select * from huge_table limit 100;

-- For a tiny table with no statistics, the
-- broadcast join mechanism is appropriate.
set default_join_distribution_mode=broadcast;

...join queries involving the tiny table...


compute stats tiny_table;
compute stats big_table;

-- Once the stats are computed, the query option has
-- no effect on join queries involving these tables.
-- Impala can determine the absolute and relative sizes
-- of each side of the join query by examining the
-- row size, cardinality, and so on of each table.

...join queries involving both of these tables...

DEFAULT_NDV_SCALE query option

Improving the precision of NDV()

As a cluster admin, you can improve the precision of NDV() using this new query option DEFAULT_NDV_SCALE. You may adjust the value in this query option to change the default precision setting for NDV() so that the SQL writers do not have to rewrite the SQLs to adjust NDV’s precision. This new option allows the cluster admin to adjust precision for each Admission Control queue according to cluster's resource usage.

Controlling the NDV scale

Until this release, the scale used in NDV(<expr>) functions was fixed at 2. Now the scale can be provided using the newly added query option. This option will not influence the NDV scale for SQL function NDV(<expr>, <scale>) in which the NDV scale is provided by the 2nd argument <scale>

Type: Number

Allowed value: Range [ 1 .. 10]

Default: 2

Example:

The following is an example of a typical workload with different NDV scales and with a cardinality of 40,090,650.


     SET DEFAULT_NDV_SCALE = 10;
     SET DEFAULT_NDV_SCALE = 5;
     SET DEFAULT_NDV_SCALE = 2;


Metric	Count Distinct	NDV2	NDV5	NDV10
Memory(GB)	3.83	1.84	1.85	1.89
Duration(s)	182.89	30.22	29.72	29.24
ErrorRate	0%	1.8%	1.17%	0.06%

DEFAULT_SPILLABLE_BUFFER_SIZE query option

Specifies the default size for a memory buffer used when the spill-to-disk mechanism is activated, for example for queries against a large table with no statistics, or large join operations.

Type: INT

Default: 2097152 (2 MB)

Unit: A numeric argument represents a size in bytes; you can also use a suffix of m or mb for megabytes, or g or gb for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.

Usage notes:

This query option sets an upper bound on the size of the internal buffer size that can be used during spill-to-disk operations. The actual size of the buffer is chosen by the query planner.

If overall query performance is limited by the time needed for spilling, consider increasing the DEFAULT_SPILLABLE_BUFFER_SIZE setting. Larger buffer sizes result in Impala issuing larger I/O requests to storage devices, which might result in higher throughput, particularly on rotational disks.

The tradeoff with a large value for this setting is increased memory usage during spill-to-disk operations. Reducing this value may reduce memory consumption.

To determine if the value for this setting is having an effect by capping the spillable buffer size, you can see the buffer size chosen by the query planner for a particular query. EXPLAIN the query while the setting EXPLAIN_LEVEL=2 is in effect.

DEFAULT_TRANSACTIONAL_TYPE query option

Use the DEFAULT_TRANSACTIONAL_TYPE query option to create insert-only transactional tables by default.

The supported values are:

NONE: The table will not be created as transactional.
INSERT_ONLY: The table will be created as transactional.

If either table properties, transactional or transactional_properties, are set, this query option is ignored.

Type: Enum as specified above

Default: NONE

Added in: Impala 3.3

DELETE_STATS_IN_TRUNCATE query option

This query option DELETE_STATS_IN_TRUNCATE can be used to delete or retain table statistics. The default value of this option is 1 or true which means table statistics will be deleted as part of truncate operation. However the default may not be an ideal solution in case of non-transactional tables when the table and column statistics for the table are also deleted by default. This can be an expensive operation especially when many truncate table commands are running concurrently. In cases where truncate operation is used to remove the existing data and then reload new data, it is likely that you will compute stats again as soon as the new data is reloaded. This would overwrite the existing statistics and hence the additional time spent by the truncate operation to delete column and table statistics becomes obsolete. To avoid this expensive operation and to improve performance, you can set this query option to false or 0. When set to false, a truncate operation will not delete the table and column statistics for the table.

Type:BOOLEAN; recognized values are 1 and 0, or true and false;

Default:TRUE (shown as 1 in output of SET statement).

DISABLE_CODEGEN query option

The DISABLE_CODEGEN is a debug option, and it's used to work around any issues with Impala's runtime code generation. If a query fails with an illegal instruction or other hardware-specific message, try setting DISABLE_CODEGEN=true and running the query again. If the query succeeds only when the DISABLE_CODEGEN option is turned on, submit the problem to Cloudera support and include that detail in the problem report.

Most queries will run significantly slower with DISABLE_CODEGEN=true.

In Impala 2.10 and higher, the DISABLE_CODEGEN_ROWS_THRESHOLD optimisation automatically disables codegen for small queries because short-running queries may run faster without the overhead of codegen.

The following values are supported:

TRUE or 1: Disables codegen.
FALSE or 0: Enables codegen.

Type: BOOLEAN

Default: FALSE (0)

DISABLE_CODEGEN_ROWS_THRESHOLD query option

Specify the number of rows processed per Impala daemon which is the cutoff point below which Impala disables native code generation for the whole query. Native code generation is beneficial for queries that process many rows because it reduces the time taken to process of each row. However, generating the native code adds latency to query startup. Therefore, automatically disabling codegen for queries that process relatively small amounts of data can improve query response time.

This setting applies to queries where the number of rows processed can be accurately determined, either through table and column statistics, or by the presence of a LIMIT clause. If Impala cannot accurately estimate the number of rows, then this setting does not apply.

If a query uses the complex data types STRUCT, ARRAY, or MAP, then codegen is never automatically disabled regardless of the DISABLE_CODEGEN_ROWS_THRESHOLD setting.

Type: Number

Default: 50000

Usage notes: Typically, you increase the default value to make this optimization apply to more queries. If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to queries that actually involve substantial work, you might see the queries being slower as a result of codegen being disabled. In that case, recompute statistics with the COMPUTE STATS or COMPUTE INCREMENTAL STATS statement. If there is a problem collecting accurate statistics, you can turn this feature off by setting the value to 0.

DISABLE_HBASE_NUM_ROWS_ESTIMATE

Use the DISABLE_HBASE_NUM_ROWS_ESTIMATE query option to disable key sampling of HBase tables in row count and row size estimation.

While generating a plan for an HBase query, the planner samples the underlying HBase tables to estimate their row count and row size, and the sampling can negatively impact the planning time. When the HBase table stats do not change much in a short time, disable the sampling by setting the DISABLE_HBASE_NUM_ROWS_ESTIMATE query option to TRUE. And Impala planner will fall back to using Hive Metastore (HMS) table stats instead.

When DISABLE_HBASE_NUM_ROWS_ESTIMATE query option is set to TRUE, you need to update the HMS table stats by running COMPUTE STATS. Alternatively, you can manually set table statistics by running ALTER TABLE.

The following values are supported:

TRUE or 1: Disables the normal key sampling of HBase tables and uses HMS table stats for estimation.
FALSE or 0: Enables the normal sampling of HBase tables.

Type: BOOLEAN

Default: FALSE

DISABLE_ROW_RUNTIME_FILTERING query

The DISABLE_ROW_RUNTIME_FILTERING query option reduces the scope of the runtime filtering feature. Queries still dynamically prune partitions, but do not apply the filtering logic to individual rows within partitions.

Only applies to queries against HDFS-based tables using the Parquet file format. For other file formats, Impala only prunes at the level of partitions, not individual rows.

This query option only applies to queries against

Type: BOOLEAN

Default: FALSE (0)

Usage notes:

Impala automatically evaluates whether the per-row filters are being effective at reducing the amount of intermediate data. Therefore, this option is typically only needed for the rare case where Impala cannot accurately determine how effective the per-row filtering is for a query.

Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.

Kudu consideration:

When applied to a query involving a Kudu table, this option turns off all runtime filtering for the Kudu table.

DISABLE_STREAMING_PREAGGREGATIONS query option

Turns off the streaming preaggregation optimization that is available in Impala 2.5 and higher. This optimization reduces unnecessary work performed by queries that perform aggregation operations on columns with few or no duplicate values, for example DISTINCT id_column or GROUP BY unique_column. If the optimization causes regressions in existing queries that use aggregation functions, you can turn it off as needed by setting this query option.

Type: BOOLEAN

Default: FALSE (0)

Usage notes:

Typically, queries that would require enabling this option involve very large numbers of aggregated values, such as a billion or more distinct keys being processed on each worker node.

DISABLE_UNSAFE_SPILLS query option

Enable this option if you prefer to have queries fail when they exceed the Impala memory limit, rather than write temporary data to disk.

Queries that spill to disk typically complete successfully, when in earlier Impala releases they would have failed. However, queries with exorbitant memory requirements due to missing statistics or inefficient join clauses could become so slow as a result that you would rather have them cancelled automatically and reduce the memory usage through standard Impala tuning techniques.

This option prevents only unsafe spill operations, meaning that one or more tables are missing statistics or the query does not include a hint to set the most efficient mechanism for a join or INSERT ... SELECT into a partitioned table. These are the tables most likely to result in suboptimal execution plans that could cause unnecessary spilling. Therefore, leaving this option enabled is a good way to find tables on which to run the COMPUTE STATS statement.

Type: BOOLEAN

Default: FALSE (0)

ENABLE_ASYNC_DDL_EXECUTION query option

When an Impala client connects to an Impala server through a Network Load Balancer, it tracks the state of that connection. The connection gets closed if the time taken in a blocking call to the server is longer than the idle timeout. If an impala client sends a TCP request after the idle timeout period elapses, it receives a TCP RST packet to indicate that the connection is no longer valid. However, if the Impala client is waiting for a response from the server, it will get stuck because it never notices the closed connection.

This limitation applies to the following clients:

impala-shell
Beeswax
Impyla
HUE

An asynchronous model has been applied to the execution of certain DDL statements that require a long execution time. So any request from an impala client to the Impala server will execute asynchronously in different threads without blocking the RPC so clients can get a query handle and poll for it for state and results to avoid impala clients hanging indefinitely.

A new query option 'enable_async_ddl_execution' has been added for this asynchronous mode and is set to true by default. You can set it to false to turn off this capability.

ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION query option

When this query option is enabled the query planner considers inserting a distinct aggregation on the inner side of the join based on whether that aggregation would reduce the number of rows by more than 75%. This optimization reduces the number of potential plan regressions.

ENABLE_EXPR_REWRITES query option

The ENABLE_EXPR_REWRITES query option controls whether to enable or disable the query compile time optimizations. These optimizations rewrite the expression trees to a more compact and optimized form that helps avoid redundant expression evaluation at run time. Performance optimizations controlled by this query option include:

Constant folding
Extracting common conjuncts from disjunctions
Simplify conditionals with constant conditions

Set the option to FALSE or 0 to disable the performance optimizations.

Type: BOOLEAN

Default: TRUE (1)

EXEC_SINGLE_NODE_ROWS_THRESHOLD query option

Specify the number of rows scanned as the cutoff point below which Impala treats a query as a small query, turning off optimizations such as parallel execution and native code generation. The overhead for these optimizations is applicable for queries involving substantial amounts of data, but it makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries allows Impala to complete them more quickly, keeping admission control slots, CPU, memory, and so on available for resource-intensive queries.

In Impala 2.3 and higher, where Impala supports the complex data types STRUCT, ARRAY, and MAP, if a query refers to any column of those types, the small-query optimization is turned off for that query regardless of the EXEC_SINGLE_NODE_ROWS_THRESHOLD setting.

For a query that is determined to be small, all work is performed on the coordinator node. This might result in some I/O being performed by remote reads. The savings from not distributing the query work and not generating native code are expected to outweigh any overhead from the remote reads.

Type: Number

Default: 100

Usage notes: Typically, you increase the default value to make this optimization apply to more queries. If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to queries that actually involve substantial work, you might see the queries being slower as a result of remote reads. In that case, recompute statistics with the COMPUTE STATS or COMPUTE INCREMENTAL STATS statement. If there is a problem collecting accurate statistics, you can turn this feature off by setting the value to -1.

EXEC_TIME_LIMIT_S query option

The EXEC_TIME_LIMIT_S query option sets a time limit on query execution in seconds. If a query is still executing when time limit expires, it is automatically canceled. The option is intended to prevent runaway queries that run for much longer than intended.

For example, an Impala administrator could set a default value of EXEC_TIME_LIMIT_S=3600 for a resource pool to automatically stop queries that run for longer than one hour. Then, if a user accidentally runs a large query that executes for more than one hour, it will be automatically killed after the time limit expires to free up resources. Users can override the default value per query or per session if they do not want the default EXEC_TIME_LIMIT_S value to apply to a specific query or a session.

Type: Number

Default: 0 (no time limit )

EXPAND_COMPLEX_TYPES Query Option

Older versions of Impala did not support complex types in the select list, and therefore a SELECT * statement only expanded to primitive types even when the table contained complex-typed columns. Support for complex types in the select list has since been added, but for backward compatibility the behaviour that * expressions skip complex types has been kept as the default. If the EXPAND_COMPLEX_TYPES query option is set to true, * expressions will include complex types as well.

Type:BOOLEAN

Default:FALSE

Added in:Impala 4.2

EXPLAIN_LEVEL query option

Controls the amount of detail provided in the output of the EXPLAIN statement. The basic output can help you identify high-level performance issues such as scanning a higher volume of data or more partitions than you expect. The higher levels of detail show how intermediate results flow between nodes and how different SQL operations such as ORDER BY, GROUP BY, joins, and WHERE clauses are implemented within a distributed query.

Type: STRING or INT

Default: 1

Arguments:

The allowed range of numeric values for this option is 0 to 3:

0 or MINIMAL: A barebones list, one line per operation. Primarily useful for checking the join order in very long queries where the regular EXPLAIN output is too long to read easily.
1 or STANDARD: The default level of detail, showing the logical way that work is split up for the distributed query.
2 or EXTENDED: Includes additional detail about how the query planner uses statistics in its decision-making process, to understand how a query could be tuned by gathering statistics, using query hints, adding or removing predicates, and so on. In Impala 3.2 and higher, the output also includes the analyzed query with the cast information in the output header, and the implicit cast info in the Predicate section.
3 or VERBOSE: The maximum level of detail, showing how work is split up within each node into query fragments that are connected in a pipeline. This extra detail is primarily useful for low-level performance testing and tuning within Impala itself, rather than for rewriting the SQL code at the user level.

Changing the value of this option controls the amount of detail in the output of the EXPLAIN statement. The extended information from level 2 or 3 is especially useful during performance tuning, when you need to confirm whether the work for the query is distributed the way you expect, particularly for the most resource-intensive operations such as join queries against large tables, queries against tables with large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to check estimated resource usage when you use the admission control or resource management features.

Usage notes:

Read the EXPLAIN output from bottom to top. The lowest lines represent the initial work of the query (scanning data files), the lines in the middle represent calculations done on each node and how intermediate results are transmitted from one node to another, and the topmost lines represent the final results being sent back to the coordinator node.

The numbers in the left column are generated internally during the initial planning phase and do not represent the actual order of operations, so it is not significant if they appear out of order in the EXPLAIN output.

At all EXPLAIN levels, the plan contains a warning if any tables in the query are missing statistics. Use the COMPUTE STATS statement to gather statistics for each table and suppress this warning. See Tables and column statistics for details about how the statistics help query performance.

The PROFILE command in impala-shell always starts with an explain plan showing full detail, the same as with EXPLAIN_LEVEL=3. After the explain plan comes the executive summary, the same output as produced by the SUMMARY command in impala-shell.

FETCH_ROWS_TIMEOUT_MS query option

Use the FETCH_ROWS_TIMEOUT_MS query option to control how long Impala waits for query results when clients fetch rows.

When this query option is set to 0, fetch requests wait indefinitely.

The timeout applies both when query result spooling is enabled and disabled:

When result spooling is disabled (SPOOL_QUERY_RESULTS = FALSE), the timeout controls how long a client waits for a single row batch to be produced by the coordinator.
When result spooling is enabled ( (SPOOL_QUERY_RESULTS = TRUE), a client can fetch multiple row batches at a time, so this timeout controls the total time a client waits for row batches to be produced.

The timeout also applies to fetch requests issued against queries in the 'RUNNING' state. A 'RUNNING' query has no rows available, so any fetch request will wait until the query transitions to the 'FINISHED' state and for it to fetch all requested rows. A query in the 'FINISHED' state means that the rows are available to be fetched.

Type: INT

Default: 10000 (10 seconds)

Added in: Impala 3.4

HBASE_CACHE_BLOCKS query option

Setting this option is equivalent to calling the setCacheBlocks method of the class org.apache.hadoop.hbase.client.Scan in an HBase Java application. Helps to control the memory pressure on the HBase RegionServer, in conjunction with the HBASE_CACHING query option.

Type: BOOLEAN

Default: FALSE (0)

HBASE_CACHING query option

Setting this option is equivalent to calling the setCaching method of the class org.apache.hadoop.hbase.client.Scan in an HBase Java application. Helps to control the memory pressure on the HBase RegionServer, in conjunction with the HBASE_CACHE_BLOCKS query option.

Type: BOOLEAN

Default: FALSE (0)

IDLE_SESSION_TIMEOUT query option

Specify the time in seconds after which an idle session is cancelled. A session is idle when no activity is occurring for any of the queries in that session, and the session has not started any new queries. Once a session is expired, you cannot issue any new query requests to it. The session remains open, but the only operation you can perform is to close it.

The IDLE_SESSION_TIMEOUT query option overrides the ‑‑idle_session_timeout startup option. See Setting timeout periods for daemons, queries, and sessions for the ‑‑idle_session_timeout startup option.

The IDLE_SESSION_TIMEOUT query option allows JDBC/ODBC connections to set the session timeout as a query option with the SET statement.

Type: Number

Default: 0

When this option is set to default:

If ‑‑idle_session_timeout is not set, the session never expires.
If ‑‑idle_session_timeout is set, use that timeout value.

JOIN_ROWS_PRODUCED_LIMIT query option

The JOIN_ROWS_PRODUCED_LIMIT is an advanced query option that limits the number of join rows produced by a join node and is used to prevent runaway join queries. A query is canceled when any one of the joins in the query produces more rows (totaled across all instances of that join) than the specified limit set by the JOIN_ROWS_PRODUCED_LIMIT option.

This limit is checked periodically, so it is possible that the actual number of rows produced may be slightly higher than the limit.

The default value of 0 specifies that there is no limit on the number of join rows produced.

Type: BIGINT

Allowed values: 0 or positive numbers

Default: 0

KUDU_READ_MODE query option

The KUDU_READ_MODE query option allows you to set a desired consistency level for scans of Kudu tables.

The following values are supported for the query option:

"DEFAULT": The value of the startup flag, --kudu_read_mode, is used.
"READ_LATEST": Commonly known as the Read Committed isolation mode, in this mode, Kudu provides no consistency guarantees for this mode, except that all returned rows were committed at some point.
"READ_AT_SNAPSHOT": Kudu will take a snapshot of the current state of the data and perform the scan over the snapshot, possibly after briefly waiting for ongoing writes to complete. This provides "Read Your Writes" consistency within a single Impala session, except in the case of a Kudu leader change. See the Kudu documentation for more details.

Type: String

Default: "DEFAULT"

Added in: Impala 3.1

KUDU_REPLICA_SELECTION query option

With the location-aware placement policy for tablet replicas in Kudu, the replicas are spread across locations so that the failure of tablet servers in one location does not make tablets unavailable. With this advanced placement policy the replicas of a tablet on Kudu tablet servers are assigned with the role as a leader or a follower. By default when Impala attempts to read from the tablet server, it is targeted to the closest tablet server that is to choose "nearby" replicas on the tablet server for scanning. However using this new query option KUDU_REPLICA_SELECTION, the queries can be targeted to the leader-only replica and skip non-leader replicas while scanning. When the new query option KUDU_REPLICA_SELECTION is set as LEADER_ONLY, Impala planner will generate a query plan that enables Impala to scan Kudu tables at the leader-only replicas.

LIVE_PROGRESS query option

When the LIVE_PROGRESS query option is set to TRUE, Impala displays an interactive progress bar showing roughly what percentage of processing has been completed for queries submitted through the impala-shell command. When the query finishes, the progress bar is erased from the impala-shell console output.

You can enable this query option within impala-shell by starting the shell with the --live_progress command-line option. You can still turn this setting off and on again within the shell through the SET command.

Starting in Impala 3.1, the summary output also includes the queuing status consisting of whether the query was queued and what was the latest queuing reason.

Type: BOOLEAN

Default: FALSE (0)

Usage notes:

The output from this query option is printed to standard error. The output is only displayed in interactive mode, that is, not when the -q or -f options are used.

Restrictions:

Because the percentage complete figure is calculated using the number of issued and completed scan ranges, which occur while reading the table data, the progress bar might reach 100% before the query is entirely finished. For example, the query might do work to perform aggregations after all the table data has been read. If many of your queries fall into this category, consider using the LIVE_SUMMARY option instead for more granular progress reporting.

The LIVE_PROGRESS and LIVE_SUMMARY query options currently do not produce any output during COMPUTE STATS operations.

Because the LIVE_PROGRESS query option is available only within the impala-shell interpreter, you cannot change the query option through the SQL SET statement using the JDBC or ODBC interfaces. The SET command in impala-shell recognizes the name as a shell-only option.

LIVE_SUMMARY query option

When the LIVE_SUMMARY query option is set to TRUE, Impala displays the same output as the SUMMARY command for queries submitted through the impala-shell command, with the measurements updated in real time as the query progresses. When the query finishes, the final SUMMARY output remains visible in the impala-shell console output.

You can enable this query option within impala-shell by starting the shell with the --live_summary command-line option. You can still turn this setting off and on again within the shell through the SET command.

Starting in Impala 3.1, the summary output also includes the queuing status consisting of whether the query was queued and what was the latest queuing reason.

Type: Boolean

Default: FALSE (0)

Usage notes:

The output from this query option is printed to standard error. The output is only displayed in interactive mode, that is, not when the -q or -f options are used.

Only a single report is displayed at any one time, with each update overwriting the previous numbers.

The live summary output can be useful for evaluating long-running queries, to evaluate which phase of execution takes up the most time, or if some hosts take much longer than others for certain operations, dragging overall performance down. By making the information available in real time, this feature lets you decide what action to take even before you cancel a query that is taking much longer than normal.

For example, you might see the HDFS scan phase taking a long time, and therefore revisit performance-related aspects of your schema design such as constructing a partitioned table, switching to the Parquet file format, running the COMPUTE STATS statement for the table, and so on. Or you might see a wide variation between the average and maximum times for all hosts to perform some phase of the query, and therefore investigate if one particular host needed more memory or was experiencing a network problem.

Restrictions:

The LIVE_PROGRESS and LIVE_SUMMARY query options currently do not produce any output during COMPUTE STATS operations.

Because the LIVE_SUMMARY query option is available only within the impala-shell interpreter, you cannot change the query option through the SQL SET statement using the JDBC or ODBC interfaces. The SET command in impala-shell recognizes the name as a shell-only option.

MAX_CNF_EXPRS query option

To optimize the performance of queries, the predicates are reorganized so that they can be applied earlier in the data processing. For example, some predicates can be pushed down to the scan node. This helps improve performance for some complex queries. However the CNF rewrite can lead to significant frontend memory usage and eventually OutOfMemory for a complex query that contains many predicates. To avoid this memory problem while maintaining performance improvement in basic circumstances and avoiding performance regressions on complex queries, this release lowered the default value of MAX_CNF_EXPRS from unlimited to 200.

MAX_ERRORS query option

Maximum number of non-fatal errors for any particular query that are recorded in the Impala log file. For example, if a billion-row table had a non-fatal data error in every row, you could diagnose the problem without all billion errors being logged. Unspecified or 0 indicates the built-in default value of 100.

This option only controls how many errors are reported. To specify whether Impala continues or halts when it encounters such errors, use the ABORT_ON_ERROR option.

Type: Number

Default: 0 (meaning 100 errors)

MAX_MEM_ESTIMATE_FOR_ADMISSION query option

Use the MAX_MEM_ESTIMATE_FOR_ADMISSION query option to set an upper limit on the memory estimates of a query as a workaround for over-estimates precluding a query from being admitted.

The query option takes effect when all of the below conditions are met:

Memory-based admission control is enabled for the pool.
The MEM_LIMIT query option is not set at the query, session, resource pool, or global level.

When the above conditions are met, MIN(MAX_MEM_ESTIMATE_FOR_ADMISSION, mem_estimate) is used for admission control.

Setting the MEM_LIMIT query option is usually a better option. Use the MAX_MEM_ESTIMATE_FOR_ADMISSION query option when it is not feasible to set MEM_LIMIT for each individual query.

Type: INT

Added in: Impala 3.1

MAX_NUM_RUNTIME_FILTERS query option

The MAX_NUM_RUNTIME_FILTERS query option sets an upper limit on the number of runtime filters that can be produced for each query.

Type: INT

Default: 10

Usage notes:

Each runtime filter imposes some memory overhead on the query. Depending on the setting of the RUNTIME_BLOOM_FILTER_SIZE query option, each filter might consume between 1 and 16 megabytes per plan fragment. There are typically 5 or fewer filters per plan fragment.

Impala evaluates the effectiveness of each filter, and keeps the ones that eliminate the largest number of partitions or rows. Therefore, this setting can protect against potential problems due to excessive memory overhead for filter production, while still allowing a high level of optimization for suitable queries.

Kudu consideration:

This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.

MAX_RESULT_SPOOLING_MEM query option

Use the MAX_RESULT_SPOOLING_MEM query option to set the maximum amount of memory used when spooling query results.

If the amount of memory exceeds this value when spooling query results, all memory will most likely be spilled to disk.

The MAX_RESULT_SPOOLING_MEM query option is applicable only when query result spooling is enabled with the SPOOL_QUERY_RESULTS query option set to TRUE.

Setting the option to 0 or -1 means the memory is unbounded.

You cannot set this query option to values below -1.

Type: INT

Default: 100 * 1024 * 1024 (100 MB)

Added in: Impala 3.4

MAX_ROW_SIZE query option

Ensures that Impala can process rows of at least the specified size. (Larger rows might be successfully processed, but that is not guaranteed.) Applies when constructing intermediate or final rows in the result set. This setting prevents out-of-control memory use when accessing columns containing huge strings.

A numeric argument represents a size in bytes; you can also use a suffix of m or mb for megabytes, or g or gb for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.

Type: INT

Default: 524288 (512 KB)

Usage notes:

If a query fails because it involves rows with long strings and/or many columns, causing the total row size to exceed MAX_ROW_SIZE bytes, increase the MAX_ROW_SIZE setting to accommodate the total bytes stored in the largest row. Examine the error messages for any failed queries to see the size of the row that caused the problem.

Impala attempts to handle rows that exceed the MAX_ROW_SIZE value where practical, so in many cases, queries succeed despite having rows that are larger than this setting.

Specifying a value that is substantially higher than actually needed can cause Impala to reserve more memory than is necessary to run the query.

In a Hadoop cluster with highly concurrent workloads and queries that process high volumes of data, traditional SQL tuning advice about minimizing wasted memory is worth remembering. For example, if a table has STRING columns where a single value might be multiple megabytes, make sure that the SELECT lists in queries only refer to columns that are actually needed in the result set, instead of using the SELECT * shorthand.

MAX_SCAN_RANGE_LENGTH query option

Maximum length of the scan range. Interacts with the number of HDFS blocks in the table to determine how many CPU cores across the cluster are involved with the processing for a query. (Each core processes one scan range.)

Lowering the value can sometimes increase parallelism if you have unused CPU capacity, but a too-small value can limit query performance because each scan range involves extra overhead.

Only applicable to HDFS tables. Has no effect on Parquet tables. Unspecified or 0 indicates backend default, which is the same as the HDFS block size for each table.

Although the scan range can be arbitrarily long, Impala internally uses an 8 MB read buffer so that it can query tables with huge block sizes without allocating equivalent blocks of memory.

The value can include unit specifiers, such as 100m or 100mb.

Type: Number

Default: 0

MAX_SPILLED_RESULT_SPOOLING_MEM query option

Use the MAX_SPILLED_RESULT_SPOOLING_MEM query option to set the maximum amount of memory that can be spilled when spooling query results.

If the amount of memory exceeds this value when spooling query results, the coordinator fragment will block until the client has consumed enough rows to free up more memory.

The MAX_SPILLED_RESULT_SPOOLING_MEM query option is applicable only when query result spooling is enabled with the SPOOL_QUERY_RESULTS query option set to TRUE.

The value must be greater than or equal to the value of MAX_RESULT_SPOOLING_MEM.

Setting the option to 0 or -1 means the memory is unbounded.

Values below -1 are not allowed for this query option.

Type: INT

Default: 1024 * 1024 * 1024 (1 GB)

Added in: Impala 3.4

MEM_LIMIT query option

The MEM_LIMIT query option defines the maximum amount of memory a query can allocate on each node. The total memory that can be used by a query is the MEM_LIMIT times the number of nodes.

There are two levels of memory limit for Impala. The ‑‑mem_limit startup option sets an overall limit for the impalad process (which handles multiple queries concurrently). That process memory limit can be expressed either as a percentage of RAM available to the process such as -mem_limit=70% or as a fixed amount of memory, such as ‑‑mem_limit=100gb. The memory available to the process is based on the host's physical memory and memory limits from Linux Control Groups. For example, if an impalad process is running in a Docker container on a host with 100 GB of memory, the memory available is 100 GB or the Docker container's memory limit, whichever is less.

The MEM_LIMIT query option, which you set through impala-shell or the SET statement in a JDBC or ODBC application, applies to each individual query. The MEM_LIMIT query option is usually expressed as a fixed size such as 10gb, and must always be less than the impalad memory limit.

If query processing approaches the specified memory limit on any node, either the per-query limit or the impalad limit, then the SQL operations will start to reduce their memory consumption, for example by writing the temporary data to disk (known as spilling to disk). The result is a query that completes successfully, rather than failing with an out-of-memory error. The tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back in. The slowdown could potentially be significant. Thus, while this feature improves reliability, you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.

Type: Number

Units: A numeric argument represents memory size in bytes; you can also use a suffix of m or mb for megabytes, or more commonly g or gb for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.

Default: 0 (unlimited)

Usage notes:

The MEM_LIMIT setting is primarily useful for production workloads. Impala's Admission Controller can be configured to automatically assign memory limits to queries and limit memory consumption of resource pools.

Use the output of the SUMMARY command in impala-shell to get a report of memory used for each phase of your most heavyweight queries on each node, and then set a MEM_LIMIT somewhat higher than that. See Understanding Performance using SUMMARY Report for usage information about the SUMMARY command.

The following examples show how to set the MEM_LIMIT query option using a fixed number of bytes, or suffixes representing gigabytes or megabytes.

[localhost:21000] > set mem_limit=3000000000;
MEM_LIMIT set to 3000000000
[localhost:21000] > select 5;
Query: select 5
+---+
| 5 |
+---+
| 5 |
+---+
              
[localhost:21000] > set mem_limit=3g;
MEM_LIMIT set to 3g
[localhost:21000] > select 5;
Query: select 5
+---+
| 5 |
+---+
| 5 |
+---+
              
[localhost:21000] > set mem_limit=3gb;
MEM_LIMIT set to 3gb
[localhost:21000] > select 5;
+---+
| 5 |
+---+
| 5 |
+---+
              
[localhost:21000] > set mem_limit=3m;
MEM_LIMIT set to 3m
[localhost:21000] > select 5;
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3mb;
MEM_LIMIT set to 3mb
[localhost:21000] > select 5;
+---+
| 5 |
+---+

The following examples show how unrecognized MEM_LIMIT values lead to errors for subsequent queries.

[localhost:21000] > set mem_limit=3pb;
MEM_LIMIT set to 3pb
[localhost:21000] > select 5;
ERROR: Failed to parse query memory limit from '3pb'.
              
[localhost:21000] > set mem_limit=xyz;
MEM_LIMIT set to xyz
localhost:21000] > select 5;
Query: select 5
ERROR: Failed to parse query memory limit from 'xyz'.

The following examples shows the automatic query cancellation when the MEM_LIMIT value is exceeded on any host involved in the Impala query. First it runs a successful query and checks the largest amount of memory used on any node for any stage of the query. Then it sets an artificially low MEM_LIMIT setting so that the same query cannot run.

[localhost:21000] > select count(*) from customer;
Query: select count(*) from customer
+----------+
| count(*) |
+----------+
| 150000   |
+----------+
              
[localhost:21000] > select count(distinct c_name) from customer;
Query: select count(distinct c_name) from customer
+------------------------+
| count(distinct c_name) |
+------------------------+
| 150000                 |
+------------------------+
              
[localhost:21000] > summary;
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
| Operator     | #Hosts | #Inst  | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail        |
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
| 06:AGGREGATE | 1      | 1      | 230.00ms | 230.00ms | 1       | 1          | 16.00 KB | -1 B          | FINALIZE      |
| 05:EXCHANGE  | 1      | 1      | 43.44us  | 43.44us  | 1       | 1          | 0 B      | -1 B          | UNPARTITIONED |
| 02:AGGREGATE | 1      | 1      | 227.14ms | 227.14ms | 1       | 1          | 12.00 KB | 10.00 MB      |               |
| 04:AGGREGATE | 1      | 1      | 126.27ms | 126.27ms | 150.00K | 150.00K    | 15.17 MB | 10.00 MB      |               |
| 03:EXCHANGE  | 1      | 1      | 44.07ms  | 44.07ms  | 150.00K | 150.00K    | 0 B      | 0 B           | HASH(c_name)  |
| 01:AGGREGATE | 1      | 1      | 361.94ms | 361.94ms | 150.00K | 150.00K    | 23.04 MB | 10.00 MB      |               |
| 00:SCAN HDFS | 1      | 1      | 43.64ms  | 43.64ms  | 150.00K | 150.00K    | 24.19 MB | 64.00 MB      | tpch.customer |
+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
              
[localhost:21000] > set mem_limit=15mb;
MEM_LIMIT set to 15mb
[localhost:21000] > select count(distinct c_name) from customer;
Query: select count(distinct c_name) from customer
ERROR:
Rejected query from pool default-pool: minimum memory reservation is greater than memory available to the query
for buffer reservations. Memory reservation needed given the current plan: 38.00 MB. Adjust either the mem_limit
or the pool config (max-query-mem-limit, min-query-mem-limit) for the query to allow the query memory limit to be
at least 70.00 MB. Note that changing the mem_limit may also change the plan. See the query profile for more
information about the per-node memory requirements.

MIN_SPILLABLE_BUFFER_SIZE query option

Specifies the minimum size for a memory buffer used when the spill-to-disk mechanism is activated, for example for queries against a large table with no statistics, or large join operations.

Type: INT

Default: 65536 (64 KB)

Usage notes:

This query option sets a lower bound on the size of the internal buffer size that can be used during spill-to-disk operations. The actual size of the buffer is chosen by the query planner.

If overall query performance is limited by the time needed for spilling, consider increasing the MIN_SPILLABLE_BUFFER_SIZE setting. Larger buffer sizes result in Impala issuing larger I/O requests to storage devices, which might result in higher throughput, particularly on rotational disks.

The tradeoff with a large value for this setting is increased memory usage during spill-to-disk operations. Reducing this value may reduce memory consumption.

MT_DOP query option

Sets the degree of parallelism used for certain operations that can benefit from multithreaded execution. You can specify values higher than zero to find the ideal balance of response time, memory usage, and CPU usage during statement processing.

note

The Impala execution engine is being revamped incrementally to add additional parallelism within a single host for certain statements and kinds of operations. The setting MT_DOP=0 uses the old code path with limited intra-node parallelism.

Higher value of MT_DOP is directly proportional to resource requirements.

Currently, MT_DOP support varies by statement type:

COMPUTE [INCREMENTAL] STATS. Impala automatically sets MT_DOP=4 for COMPUTE STATS and COMPUTE INCREMENTAL STATS statements on Parquet tables.
SELECT statements. MT_DOP is 0 by default for SELECT statements but can be set to a value greater than 0 to control intra-node parallelism. This may be useful to tune query performance and in particular to reduce execution time of long-running, CPU-intensive queries.
DML statements. MT_DOP values greater than zero are not currently supported for DML statements. DML statements will produce an error if MT_DOP is set to a non-zero value.
In Impala 3.4 (CDP 1.2) and earlier, not all SELECT statements support setting MT_DOP. Specifically, only scan and aggregation operators, and local joins that do not need data exchanges (such as for nested types) are supported. Other SELECT statements produce an error if MT_DOP is set to a non-zero value.

Type: INT

Default: 0

Because COMPUTE STATS and COMPUTE INCREMENTAL STATS statements for Parquet tables benefit substantially from extra intra-node parallelism, Impala automatically sets MT_DOP=4 when computing stats for Parquet tables.

Range: 0 to 64

Examples:

The following example shows how to run a COMPUTE STATS statement against a Parquet table with or without an explicit MT_DOP setting:


-- Explicitly setting MT_DOP to 0 selects the old code path.
set mt_dop = 0;
MT_DOP set to 0

-- The analysis for the billion rows is distributed among hosts,
-- but uses only a single core on each host.
compute stats billion_rows_parquet;
+-----------------------------------------+
| summary                                 |
+-----------------------------------------+
| Updated 1 partition(s) and 2 column(s). |
+-----------------------------------------+

drop stats billion_rows_parquet;

-- Using 4 logical processors per host is faster.
set mt_dop = 4;
MT_DOP set to 4

compute stats billion_rows_parquet;
+-----------------------------------------+
| summary                                 |
+-----------------------------------------+
| Updated 1 partition(s) and 2 column(s). |
+-----------------------------------------+

drop stats billion_rows_parquet;

-- Unsetting the option reverts back to its default.
-- Which for COMPUTE STATS and a Parquet table is 4,
-- so again it uses the fast path.
unset MT_DOP;
Unsetting option MT_DOP

compute stats billion_rows_parquet;
+-----------------------------------------+
| summary                                 |
+-----------------------------------------+
| Updated 1 partition(s) and 2 column(s). |
+-----------------------------------------+

The following example shows the effects of setting MT_DOP for a query on a Parquet table:


set mt_dop = 0;
MT_DOP set to 0

-- COUNT(DISTINCT) for a unique column is CPU-intensive.
select count(distinct id) from billion_rows_parquet;
+--------------------+
| count(distinct id) |
+--------------------+
| 1000000000         |
+--------------------+
Fetched 1 row(s) in 67.20s

set mt_dop = 16;
MT_DOP set to 16

-- Introducing more intra-node parallelism for the aggregation
-- speeds things up, and potentially reduces memory overhead by
-- reducing the number of scanner threads.
select count(distinct id) from billion_rows_parquet;
+--------------------+
| count(distinct id) |
+--------------------+
| 1000000000         |
+--------------------+
Fetched 1 row(s) in 17.19s

The following example shows how queries that are not compatible with non-zero MT_DOP settings produce an error when MT_DOP is set:


set mt_dop=1;
MT_DOP set to 1

insert into a1
select * from a2;
ERROR: NotImplementedException: MT_DOP not supported for DML statements.

Related Information:

Compute Stats Statement

Impala Aggregate Functions

NUM_NODES query option

Limit the number of nodes that process a query, typically during debugging.

Type: Number

Allowed values: Only accepts the values 0 (meaning all nodes) or 1 (meaning all work is done on the coordinator node).

Default: 0

Usage notes:

If you are diagnosing a problem that you suspect is due to a timing issue due to distributed query processing, you can set NUM_NODES=1 to verify if the problem still occurs when all the work is done on a single node.

You might set the NUM_NODES option to 1 briefly, during INSERT or CREATE TABLE AS SELECT statements. Normally, those statements produce one or more data files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a partitioned table, the default behavior could produce many small files when intuitively you might expect only a single output file. SET NUM_NODES=1 turns off the distributed aspect of the write operation, making it more likely to produce only one or a few data files.

NUM_ROWS_PRODUCED_LIMIT query option

The NUM_ROWS_PRODUCED_LIMIT query option limits the number of rows produced by a query. A query is canceled when its execution produces more rows than the specified limit set by the NUM_ROWS_PRODUCED_LIMIT option.

This limit only applies when the results are returned to a client, for example, to a SELECT query, but not to an INSERT query.

The default value of 0 specifies that there is no limit on the number of rows produced.

Type: INT

Allowed values: 0 or positive numbers

Default: 0

NUM_SCANNER_THREADS query option

Maximum number of scanner threads (on each node) used for each query. By default, Impala uses as many cores as are available (one thread per core). You might lower this value if queries are using excessive resources on a busy cluster. Impala imposes a maximum value automatically, so a high value has no practical effect.

Type: Number

Default: 0

OPTIMIZE_PARTITION_KEY_SCANS query option

Enables a fast code path for queries that apply simple aggregate functions to partition key columns: MIN(key_column), MAX(key_column), or COUNT(DISTINCT key_column).

Type: BOOLEAN

Default: FALSE (0)

Usage notes:

This optimization speeds up common introspection operations over partition key columns, for example determining the distinct values of partition keys.

This optimization does not apply to SELECT statements that reference columns that are not partition keys. It also only applies when all the partition key columns in the SELECT statement are referenced in one of the following contexts:

Within a MAX() or MAX() aggregate function or as the argument of any aggregate function with the DISTINCT keyword applied.
Within a WHERE, GROUP BY or HAVING clause.

This optimization is enabled by a query option because it skips some consistency checks and therefore can return slightly different partition values if partitions are in the process of being added, dropped, or loaded outside of Impala. Queries might exhibit different behavior depending on the setting of this option in the following cases:

If files are removed from a partition using HDFS or other non-Impala operations, there is a period until the next REFRESH of the table where regular queries fail at run time because they detect the missing files. With this optimization enabled, queries that evaluate only the partition key column values (not the contents of the partition itself) succeed, and treat the partition as if it still exists.
If a partition contains any data files, but the data files do not contain any rows, a regular query considers that the partition does not exist. With this optimization enabled, the partition is treated as if it exists.

If the partition includes no files at all, this optimization does not change the query behavior: the partition is considered to not exist whether or not this optimization is enabled.

OPTIMIZE_SIMPLE_LIMIT query option

This new planner query option OPTIMIZE_SIMPLE_LIMIT optimizes the planning time for simple limit queries by only considering a minimal set of partitions whose file descriptors add up to N (the specified limit). Each file is conservatively estimated to contain 1 row.

This reduces the number of partitions processed by the planner which, according to query profiling, has been the main contributor to the planning time especially for large numbers of partitions. Further, within each partition, only the number of non-empty files that brings the total to N will be considered. For the optimization to be considered, this option should be always set to "true".

This is an opt-in optimization and also applies to subqueries and view.

ORC_SCHEMA_RESOLUTION query option

The ORC_SCHEMA_RESOLUTION query option allows Impala to look up columns within ORC files by column names, rather than by column indices, when necessary.

The allowed values are:

POSITION (0)
NAME (1)

Type: Enum of the above-allowed values

Usage notes:

By default, Impala looks up columns within an ORC file based on the order of columns in the table.

Impact of column renaming

Position-based resolution is resilient to column renaming however, a name-based resolution will fail when you rename a column. The following example demonstrates the impact of column renaming on position-based and name-based resolution.

View the details of the table test_orc

select * from test_orc;
+---------+------------+------------+
| int_col | string_col | double_col |
+---------+------------+------------+
| 0       | A          | 0.5        |
| 1       | B          | 1.5        |
| 2       | C          | 2.5        |
+---------+------------+------------+

Rename the column “double_col” to “new_double_col”
```
alter table test_orc change double_col new_double_col double;
```
Set resolving ORC file columns by Index
```
set orc_schema_resolution=POSITION;
```

View the table

select int_col, string_col, new_double_col from test_orc;
+---------+------------+----------------+
| int_col | string_col | new_double_col |
+---------+------------+----------------+
| 0       | A          | 0.5            |
| 1       | B          | 1.5            |
| 2       | C          | 2.5            |
+---------+------------+----------------+

The above query results show that renaming a column did not impact when you resolved the ORC file columns by POSITION. However, if you resolve ORC file columns by NAME, set orc_schema_resolution=NAME, viewing the table details will fail to fetch the renamed columns.

select int_col, string_col, new_double_col from test_orc;
+---------+------------+----------------+
| int_col | string_col | new_double_col |
+---------+------------+----------------+
| 0       | A          | NULL           |
| 1       | B          | NULL           |
| 2       | C          | NULL           |
+---------+------------+----------------+

PARQUET_COMPRESSION_CODEC query option

Deprecated. Use COMPRESSION_CODEC in Impala 2.0 and later.

PARQUET_ANNOTATE_STRINGS_UTF8 query option

Causes Impala INSERT and CREATE TABLE AS SELECT statements to write Parquet files that use the UTF-8 annotation for STRING columns.

Type: BOOLEAN

Default: FALSE (0)

Usage notes:

By default, Impala represents a STRING column in Parquet as an unannotated binary field.

Impala always uses the UTF-8 annotation when writing CHAR and VARCHAR columns to Parquet files. An alternative to using the query option is to cast STRING values to VARCHAR.

This option is to help make Impala-written data more interoperable with other data processing engines. Impala itself currently does not support all operations on UTF-8 data. Although data processed by Impala is typically represented in ASCII, it is valid to designate the data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.

PARQUET_ARRAY_RESOLUTION query option

The PARQUET_ARRAY_RESOLUTION query option controls the behavior of the indexed-based resolution for nested arrays in Parquet.

In Parquet, you can represent an array using a 2-level or 3-level representation. The modern, standard representation is 3-level. The legacy 2-level scheme is supported for compatibility with older Parquet files. However, there is no reliable metadata within Parquet files to indicate which encoding was used. It is even possible to have mixed encodings within the same file if there are multiple arrays. The PARQUET_ARRAY_RESOLUTION option controls the process of resolution that is to match every column/field reference from a query to a column in the Parquet file.

The supported values for the query option are:

THREE_LEVEL: Assumes arrays are encoded with the 3-level representation, and does not attempt the 2-level resolution.
TWO_LEVEL: Assumes arrays are encoded with the 2-level representation, and does not attempt the 3-level resolution.
TWO_LEVEL_THEN_THREE_LEVEL: First tries to resolve assuming a 2-level representation, and if unsuccessful, tries a 3-level representation.

All of the above options resolve arrays encoded with a single level.

A failure to resolve a column/field reference in a query with a given array resolution policy does not necessarily result in a warning or error returned by the query. A mismatch might be treated like a missing column (returns NULL values), and it is not possible to reliably distinguish the 'bad resolution' and 'legitimately missing column' cases.

The name-based policy generally does not have the problem of ambiguous array representations. You specify to use the name-based policy by setting the PARQUET_FALLBACK_SCHEMA_RESOLUTION query option to NAME.

Type: Enum of TWO_LEVEL, TWO_LEVEL_THEN_THREE_LEVEL, and THREE_LEVEL

Default: THREE_LEVEL

Examples:

EXAMPLE A: The following Parquet schema of a file can be interpreted as a 2-level or 3-level:


ParquetSchemaExampleA {
  optional group single_element_groups (LIST) {
    repeated group single_element_group {
      required int64 count;
    }
  }
}

The following table schema corresponds to a 2-level interpretation:


CREATE TABLE t (col1 array<struct<f1: bigint>>) STORED AS PARQUET;

Successful query with a 2-level interpretation:


SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
SELECT ITEM.f1 FROM t.col1;

The following table schema corresponds to a 3-level interpretation:


CREATE TABLE t (col1 array<bigint>) STORED AS PARQUET;

Successful query with a 3-level interpretation:


SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
SELECT ITEM FROM t.col1

EXAMPLE B: The following Parquet schema of a file can be only be successfully interpreted as a 2-level:


ParquetSchemaExampleB {
  required group list_of_ints (LIST) {
    repeated int32 list_of_ints_tuple;
  }
}

The following table schema corresponds to a 2-level interpretation:


CREATE TABLE t (col1 array<int>) STORED AS PARQUET;

Successful query with a 2-level interpretation:


SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
SELECT ITEM FROM t.col1

Unsuccessful query with a 3-level interpretation. The query returns NULLs as if the column was missing in the file:


SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
SELECT ITEM FROM t.col1

PARQUET_DICTIONARY_FILTERING query option

The PARQUET_DICTIONARY_FILTERING query option controls whether Impala uses dictionary filtering for Parquet files.

To efficiently process a highly selective scan query, when this option is enabled, Impala checks the values in the Parquet dictionary page and determines if the whole row group can be thrown out.

A column chunk is purely dictionary encoded and can be used by dictionary filtering if any of the following conditions are met:

If the encoding_stats is in the Parquet file, dictionary filtering uses it to determine if there are only dictionary encoded pages (i.e. there are no data pages with an encoding other than PLAIN_DICTIONARY).
If the encoding stats are not present, dictionary filtering looks at the encodings. The column is purely dictionary encoded if both of the conditions satisfy:
- PLAIN_DICTIONARY is present.
- Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed.
Dictionary filtering works for the Parquet dictionaries with less than 40000 values if the file was written by Impala 2.9.

In the query runtime profile output for each Impalad instance, the NumDictFilteredRowGroups field in the SCAN node section shows the number of row groups that were skipped based on dictionary filtering.

Note that row groups can be filtered out by Parquet statistics, and in such cases, dictionary filtering will not be considered.

The supported values for the query option are:

TRUE (1): Use dictionary filtering.
FALSE (0): Do not use dictionary filtering
Any other values are treated as FALSE.

Type: BOOLEAN

Default: TRUE (1)

PARQUET_FALLBACK_SCHEMA_RESOLUTION query option

The PARQUET_FALLBACK_SCHEMA_RESOLUTION query option allows Impala to look up columns within Parquet files by column name, rather than column order, when necessary. The allowed values are:

POSITION (0)
NAME (1)

Type: Enum of the above allowed values

Usage notes:

By default, Impala looks up columns within a Parquet file based on the order of columns in the table. The name setting for this option enables behavior for Impala queries similar to the Hive setting parquet.column.index access=false. It also allows Impala to query Parquet files created by Hive with the parquet.column.index.access=false setting in effect.

PARQUET_FILE_SIZE query option

Specifies the maximum size of each Parquet data file produced by Impala INSERT statements in bytes, or with a trailing m or g character to indicate megabytes or gigabytes.

With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB in Impala 2.0 and later) could be much larger than needed for each data file. For INSERT operations into such tables, you can increase parallelism by specifying a smaller PARQUET_FILE_SIZE value, resulting in more HDFS blocks that can be processed by different nodes.

Type: Number, with optional unit specifier

Default: 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)

PARQUET_OBJECT_STORE_SPLIT_SIZE

Use the PARQUET_OBJECT_STORE_SPLIT_SIZE query option to control the Parquet split sizes on non-block based stores, e.g. S3, ADLS, etc.

Type: INT

Allowed values: The value must be greater than or equal to 1 MB.

Default: 256 MB

Added in: Impala 3.4 / CDP 1.2

PARQUET_PAGE_ROW_COUNT_LIMIT query option

Use the PARQUET_PAGE_ROW_COUNT_LIMIT query option to set the maximum number of rows that can be written on a single Parquet data page. By default there is no row count limit.

Type: INT

Allowed values: Positive integers

Added in: Impala 3.3

PARQUET_READ_PAGE_INDEX query option

Use the PARQUET_READ_PAGE_INDEX query option to disable or enable using the Parquet page index during scans. The page index contains min/max statistics at the page-level granularity. It can be used to skip pages and rows that do not match the conditions in the WHERE clause.

This option enables the same optimization as the PARQUET_READ_STATISTICS at the finer grained page level.

Impala supports filtering based on Parquet statistics:

Of the types: BOOLEAN, INT, DECIMAL, STRING, TIMESTAMP, DATE
For simple predicates of the forms: <slot> <op> <constant> or <constant> <op> <slot>, where <op> is LT, LE, GE, GT, and EQ

The supported values for the query option are:

TRUE (1): Read the page-level statistics from the Parquet page index during query processing and filter out pages based on the statistics.
FALSE (0): Do not use the Parquet page index.
Any other values are treated as false.

Type: BOOLEAN

Default: TRUE (1)

PARQUET_READ_STATISTICS query option

The PARQUET_READ_STATISTICS query option controls whether to read statistics from Parquet files and use them during query processing.

Parquet stores min/max stats which can be used to skip reading row groups if they don't qualify a certain predicate. When this query option is set to true, Impala reads the Parquet statistics and skips reading row groups that do not match the conditions in the WHERE clause.

Impala supports filtering based on Parquet statistics:

Of the numerical types for the old version of the statistics: BOOLEAN, INTEGER, FLOAT
Of the types for the new version of the statistics (starting in impala 2.8): BOOLEAN, INTEGER, FLOAT, DECIMAL, STRING, TIMESTAMP, DATE
For simple predicates of the forms: <slot> <op> <constant> or <constant> <op> <slot>, where <op> is LT, LE, GE, GT, and EQ

The PARQUET_READ_STATISTICS option provides a workaround when dealing with files that have corrupt Parquet statistics and unknown errors.

In the query runtime profile output for each Impalad instance, the NumStatsFilteredRowGroups field in the SCAN node section shows the number of row groups that were skipped based on Parquet statistics.

The supported values for the query option are:

TRUE (1): Read statistics from Parquet files and use them in query processing.
FALSE (0): Do not use Parquet read statistics.
Any other values are treated as FALSE.

Type: BOOLEAN

Default: TRUE

PARQUET_WRITE_PAGE_INDEX query option

The PARQUET_WRITE_PAGE_INDEX query option disables or enables the Parquet page index writing.

Impala writes page-level statistics into the Parquet page index of the types: INT, DECIMAL, STRING, TIMESTAMP, DATE

The supported values for the query option are:

TRUE (1): Write the Parquet page index when creating Parquet files.
FALSE (0): Do not write the Parquet page index when creating Parquet files.
Any other values are treated as false.

Type: BOOLEAN

Default: TRUE (1)

Added in: Impala 3.3

PREFETCH_MODE query option

Determines whether the prefetching optimization is applied during join query processing.

Allowed values are:

NONE (0)
HT_BUCKET (1): Hash table buckets are prefetched during join query processing.

Type: Enum of the above allowed values

Default: 1 (HT_BUCKET)

QUERY_TIMEOUT_S query option

Sets the idle query timeout value for the session, in seconds. Queries that sit idle for longer than the timeout value are automatically cancelled. If the system administrator specified the --idle_query_timeout startup option, QUERY_TIMEOUT_S must be smaller than or equal to the --idle_query_timeout value.

note

The timeout clock for queries and sessions only starts ticking when the query or session is idle.

For queries, this means the query has results ready but is waiting for a client to fetch the data. A query can run for an arbitrary time without triggering a timeout, because the query is computing results rather than sitting idle waiting for the results to be fetched. The timeout period is intended to prevent unclosed queries from consuming resources and taking up slots in the admission count of running queries, potentially preventing other queries from starting.

For sessions, this means that no query has been submitted for some period of time.

Type: Number

Default: 0 (no timeout if --idle_query_timeout not in effect; otherwise, use --idle_query_timeout value)

REFRESH_UPDATED_HMS_PARTITIONS query option

As the name implies the query option REFRESH_UPDATED_HMS_PARTITIONS is used to refresh any updated HMS partitions.

This option is disabled by default so that the performance is not compromised when refreshing a table. However, for certain corner case scenarios refresh table command does not detect changed partitions. In case of the default refresh, catalogd detects, adds any new partitions and removes any partitions which are not present in HMS anymore. However, it does not update any partitions that changed (eg. change of location). When this query option is enabled, the refresh table command will detect certain changes to the partitions and update them accordingly. Currently, catalogd will update the partitions if any of the following StorageDescriptor properties have been modified. 1. Partition Location 2. Partition Fileformat. 3 SerdeInfo. 4. Partition schema changes. 5. Partition bucketInfo.

Type:BOOLEAN; recognized values are 1 and 0, or true and false; any other value will be interpreted as false.

Default:FALSE (shown as 0 in output of SET statement).

REPLICA_PREFERENCE query option

The REPLICA_PREFERENCE query option lets you distribute the work more evenly if hotspots and bottlenecks persist. It causes the access cost of all replicas of a data block to be considered equal to or worse than the configured value. This allows Impala to schedule reads to suboptimal replicas (e.g. local in the presence of cached ones) in order to distribute the work across more executor nodes.

Allowed values are:

CACHE_LOCAL (0)
DISK_LOCAL (2)
REMOTE (4)

Type: Enum

Default: CACHE_LOCAL (0)

Usage Notes:

By default Impala selects the best replica it can find in terms of access cost. The preferred order is cached, local, and remote. With REPLICA_PREFERENCE, the preference of all replicas are capped at the selected value. For example, when REPLICA_PREFERENCE is set to DISK_LOCAL, cached and local replicas are treated with the equal preference. When set to REMOTE, all three types of replicas, cached, local, remote, are treated with equal preference.

REQUEST_POOL query option

The pool or queue name that queries should be submitted to. Only applies when you enable the Impala admission control feature. Specifies the name of the pool used by requests from Impala to the resource manager.

Type: STRING

Default: An empty string (use the user-to-pool mapping defined by an impalad startup option in the Impala configuration file)

RESOURCE_TRACE_RATIO query option

The RESOURCE_TRACE_RATIO query option specifies the ratio of queries where the CPU usage info will be included in the profiles. Collecting CPU usage and sending it around adds a slight overhead during query execution. This query option lets you control whether to collect additional information to diagnose the resource usage.

For example, setting RESOURCE_TRACE_RATIO=1 adds a trace of the CPU usage to the profile of each query.

Setting RESOURCE_TRACE_RATIO=0.5 means that randomly selected half of all queries will have that information collected by the coordinator and included in the profiles.

Setting RESOURCE_TRACE_RATIO=0 means that CPU usage will not be tracked and included in the profiles.

Values from 0 to 1 are allowed.

Type: Number

Default: 0

Added in: Impala 3.2

RETRY_FAILED_QUERIES query option

Use the RETRY_FAILED_QUERIES query option to control whether or not queries are transparently retried on cluster membership changes.

Cluster membership changes typically occur when an impalad crashes, or if the node is blacklisted by the Impala Coordinator. If a SELECT query fails due to a cluster membership change, the Coordinator will cancel and unregister the running query and then launch a retry of the query. For example, if one of the executor nodes fails during query execution, the query fails but is transparently re-executed, either with the executor node immediately replaced, or with a temporarily reduced number of executor nodes. This feature supports retrying the entire query and NOT the individual query fragments. INSERT and DDL queries will NOT be retried.

Type: BOOLEAN

Default: TRUE

RUNTIME_BLOOM_FILTER_SIZE query option

Size (in bytes) of Bloom filter data structure used by the runtime filtering feature.

Type: INTEGER

Default: 1048576 (1 MB)

Maximum: 16 MB

Usage notes:

This setting affects optimizations for large and complex queries, such as dynamic partition pruning for partitioned tables, and join optimization for queries that join large tables. Larger filters are more effective at handling higher cardinality input sets, but consume more memory per filter.

If your query filters on high-cardinality columns (for example, millions of different values) and you do not get the expected speedup from the runtime filtering mechanism, consider doing some benchmarks with a higher value for RUNTIME_BLOOM_FILTER_SIZE. The extra memory devoted to the Bloom filter data structures can help make the filtering more accurate.

Because the effectiveness of this setting depends so much on query characteristics and data distribution, you typically only use it for specific queries that need some extra tuning, and the ideal value depends on the query. Consider setting this query option immediately before the expensive query and unsetting it immediately afterward.

Kudu consideration:

This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.

RUNTIME_FILTER_MAX_SIZE query option

The RUNTIME_FILTER_MAX_SIZE query option adjusts the settings for the runtime filtering feature. This option defines the maximum size for a filter, no matter what the estimates produced by the planner are. This value also overrides any lower number specified for the RUNTIME_BLOOM_FILTER_SIZE query option. Filter sizes are rounded up to the nearest power of two.

Type: INTEGER

Default: 1048576 (1 MB)

Usage notes:

Kudu consideration:

This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.

RUNTIME_FILTER_MIN_SIZE query option

The RUNTIME_FILTER_MIN_SIZE query option adjusts the settings for the runtime filtering feature. This option defines the minimum size for a filter, no matter what the estimates produced by the planner are. This value also overrides any lower number specified for the RUNTIME_BLOOM_FILTER_SIZE query option. Filter sizes are rounded up to the nearest power of two.

Type: INTEGER

Default: 0 (meaning use the value from the corresponding impalad startup option)

Usage notes:

Kudu consideration:

This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.

RUNTIME_FILTER_MODE query option

The RUNTIME_FILTER_MODE query option adjusts the settings for the runtime filtering feature. It turns this feature on and off, and controls how extensively the filters are transmitted between hosts.

Allowed values are:

OFF (0)
LOCAL (1)
GLOBAL (2)

Type: Enum of the above allowed values

Default: GLOBAL (2)

Usage notes:

The default is GLOBAL setting is recommended for a wide variety of workloads, to provide best performance with out of the box settings.

The lowest setting of LOCAL does a similar level of optimization (such as partition pruning) as in earlier Impala releases. This setting was the default in Impala 2.5, to allow for a period of post-upgrade testing for existing workloads. This setting is suitable for workloads with non-performance-critical queries, or if the coordinator node is under heavy CPU or memory pressure.

You might change the setting to OFF if your workload contains many queries involving partitioned tables or joins that do not experience a performance increase from the runtime filters feature. If the overhead of producing the runtime filters outweighs the performance benefit for queries, you can turn the feature off entirely.

RUNTIME_FILTER_WAIT_TIME_MS query option

The RUNTIME_FILTER_WAIT_TIME_MS query option adjusts the settings for the runtime filtering feature. It specifies a time in milliseconds that each scan node waits for runtime filters to be produced by other plan fragments.

Type: INTEGER

Default: 0 (meaning use the value from the corresponding impalad startup option)

Usage notes:

S3_SKIP_INSERT_STAGING query option

Speeds up INSERT operations on tables or partitions residing on the Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind if an error occurs partway through the operation.

By default, Impala write operations to S3 tables and partitions involve a two-stage process. Impala writes intermediate files to S3, then (because S3 does not provide a rename operation) those intermediate files are copied to their final location, making the process more expensive as on a filesystem that supports renaming or moving files. This query option makes Impala skip the intermediate files, and instead write the new data directly to the final destination.

Type: BOOLEAN

Default: TRUE (1)

Usage notes:

The timing of file deletion during an INSERT OVERWRITE operation makes it impractical to write new files to S3 and delete the old files in a single operation. Therefore, this query option only affects regular INSERT statements that add to the existing data in a table, not INSERT OVERWRITE statements. Use TRUNCATE TABLE if you need to remove all contents from an S3 table before performing a fast INSERT with this option enabled.

Performance improvements with this option enabled can be substantial. The speed increase might be more noticeable for non-partitioned tables than for partitioned tables.

SCAN_BYTES_LIMIT query option

The SCAN_BYTES_LIMIT query option sets a limit on the bytes scanned by HDFS and HBase SCAN operations. If a query is still executing when the query’s coordinator detects that it has exceeded the limit, the query is terminated with an error. The option is intended to prevent runaway queries that scan more data than is intended.

For example, an Impala administrator could set a default value of SCAN_BYTES_LIMIT=100GB for a resource pool to automatically stop queries that scan more than 100 GB of data (see Impala Admission Control and Query Queuing for information about default query options). If a user accidentally omits a partition filter in a WHERE clause and runs a large query that scans a lot of data, the query will be automatically terminated after it scans more data than the SCAN_BYTES_LIMIT.

You can override the default value per-query or per-session, in the same way as other query options, if you do not want the default SCAN_BYTES_LIMIT value to apply to a specific query or session.

Because the checks are done periodically, the query may scan over the limit at times.

Type: Number

Units:

A numeric argument represents memory size in bytes.
Specify a suffix of m or mb for megabytes.
Specify a suffix of g or gb for gigabytes.
If you specify a suffix with unrecognized formats, subsequent queries fail with an error.

Default: 0 (no limit)

Added in: Impala 3.1

SCHEDULE_RANDOM_REPLICA query option

The SCHEDULE_RANDOM_REPLICA query option fine-tunes the scheduling algorithm for deciding which host processes each HDFS data block or Kudu tablet to reduce the chance of CPU hotspots.

By default, Impala estimates how much work each host has done for the query, and selects the host that has the lowest workload. This algorithm is intended to reduce CPU hotspots arising when the same host is selected to process multiple data blocks / tablets. Use the SCHEDULE_RANDOM_REPLICA query option if hotspots still arise for some combinations of queries and data layout.

The SCHEDULE_RANDOM_REPLICA query option only applies to tables and partitions that are not enabled for the HDFS caching.

Type: BOOLEAN

Default: FALSE

SCRATCH_LIMIT query option

Specifies the maximum amount of disk storage, in bytes, that any Impala query can consume on any host using the spill to disk mechanism that handles queries that exceed the memory limit.

Specify the size in bytes, or with a trailing m or g character to indicate megabytes or gigabytes.

A value of zero turns off the spill to disk feature for queries in the current session, causing them to fail immediately if they exceed the memory limit.

The amount of memory used per host for a query is limited by the MEM_LIMIT query option.

The more DataNodes in the cluster, the less memory is used on each host, and therefore also less scratch space is required for queries that exceed the memory limit.

Type: Number, with optional unit specifier

Default: -1 (amount of spill space is unlimited)

SHUFFLE_DISTINCT_EXPRS query option

The SHUFFLE_DISTINCT_EXPRS query option controls the shuffling behavior when a query has both grouping and distinct expressions. Impala can optionally include the distinct expressions in the hash exchange to spread the data among more nodes. However, this plan requires one more hash exchange phase.

It is recommended that you turn off this option if the NDVs of the grouping expressions are high.

Type: BOOLEAN

Default: FALSE

SPOOL_QUERY_RESULTS query Option

Use the SPOOL_QUERY_RESULTS query option to enable query result spooling, which is disabled by default.

Query result spooling controls how rows are returned to the client.

When query result spooling is disabled (SPOOL_QUERY_RESULTS = FALSE), Impala relies on clients to fetch results to trigger the generation of more result row batches until all the result rows have been produced. If a client issues a query without fetching all the results, the query fragments will continue to consume the resources until the query is cancelled and unregistered, potentially tying up resources and cause other queries to wait for extended period of time in admission control.
When query result spooling is enabled (SPOOL_QUERY_RESULTS = TRUE), the result sets of queries are eagerly fetched and spooled, either in memory or on disk.
Once all result rows have been fetched and stored in the spooling location, the resources are freed up. Incoming client fetches can get the data from the spooled results.

Type: INT

Default: FALSE

Added in: Impala 3.4

SPOOL_ALL_RESULTS_FOR _RETRIES

Leave this query option SPOOL_ALL_RESULTS_FOR _RETRIES at its default value “true” for queries that can be retried. This new query option only applies if the query options RETRY_FAILED_QUERIES and SPOOL_QUERY_RESULTS are enabled.

Currently, to avoid incorrect results, query retry will be skipped if any results are returned to the client in the original query. This new query option is introduced, for queries that can be retried, to control spooling all query results before returning any to the client. If the query results cannot be contained in the allocated result-spooling space, results will be returned and query retry will be disabled on the query.

If you set SPOOL_ALL_RESULTS_FOR _RETRIES to false, it will fall back to the original behavior - the client can fetch results when they are ready. For queries that cannot be retried, this option does not impact.

Type: BOOLEAN

Default: TRUE

SUPPORT_START_OVER query option

Leave this setting at its default value. It is a read-only setting, tested by some client applications such as Hue.

If you accidentally change it through impala-shell, subsequent queries encounter errors until you undo the change by issuing UNSET support_start_over.

Type: BOOLEAN

Default: FALSE

SYNC_DDL query option

When enabled, causes any DDL operation such as CREATE TABLE or ALTER TABLE to return only when the changes have been propagated to all other Impala nodes in the cluster by the Impala catalog service. That way, if you issue a subsequent CONNECT statement in impala-shell to connect to a different node in the cluster, you can be sure that other node will already recognize any added or changed tables. (The catalog service automatically broadcasts the DDL changes to all nodes automatically, but without this option there could be a period of inconsistency if you quickly switched to another node, such as by issuing a subsequent query through a load-balancing proxy.)

Although INSERT is classified as a DML statement, when the SYNC_DDL option is enabled, INSERT statements also delay their completion until all the underlying data and metadata changes are propagated to all Impala nodes. Internally, Impala inserts have similarities with DDL statements in traditional database systems, because they create metadata needed to track HDFS block locations for new files and they potentially add new partitions to partitioned tables.

Type: BOOLEAN

Default: FALSE

THREAD_RESERVATION_AGGREGATE_LIMIT query option

The THREAD_RESERVATION_AGGREGATE_LIMIT query option limits the number of reserved threads for a query across all nodes on which it is executing. The option is intended to prevent execution of complex queries that can consume excessive CPU or operating system resources on a cluster. Queries that have more threads than this threshold are rejected by Impala’s admission controller before they start executing.

For example, an Impala administrator could set a default value of THREAD_RESERVATION_AGGREGATE_LIMIT=2000 for a resource pool on a 100 node where they expect only relatively simple queries with less than 20 threads per node to run. This will reject queries that require more than 2000 reserved threads across all nodes, for example a query with 21 fragments running on all 100 nodes of the cluster.

You can override the default value per-query or per-session, in the same way as other query options, if you do not want the default THREAD_RESERVATION_AGGREGATE_LIMIT value to apply to a specific query or session.

Type: Number

Default: 0 (no limit)

Added in: Impala 3.1

THREAD_RESERVATION_LIMIT query option

The THREAD_RESERVATION_LIMIT query option limits the number of reserved threads for a query on each node. The option is intended to prevent execution of complex queries that can consume excessive CPU or operating system resources on a single node. Queries that have more threads per node than this threshold are rejected by Impala’s admission controller before they start executing. You can see the number of reserved threads for a query in its explain plan in the “Per-Host Resource Reservation" line.

For example, an Impala administrator could set a default value of THREAD_RESERVATION_LIMIT=100 for a resource pool where they expect only relatively simple queries to run. This will reject queries that require more than 100 reserved threads on a node, for example, queries with more than 100 fragments.

You can override the default value per-query or per-session, in the same way as other query options, if you do not want the default THREAD_RESERVATION_LIMIT value to apply to a specific query or session.

Type: Number

Default: 3000

Added in: Impala 3.1

TIMEZONE query option

The TIMEZONE query option defines the timezone used for conversions between UTC and the local time. If not set, Impala uses the system time zone where the Coordinator Impalad runs. As query options are not sent to the Coordinator immediately, the timezones are validated only when the query runs.

Impala takes the timezone into a consideration in the following cases:

When calling the NOW() function
When converting between Unix time and timestamp if the use_local_tz_for_unix_timestamp_conversions flag is TRUE
When reading Parquet timestamps written by Hive if the convert_legacy_hive_parquet_utc_timestamps flag is TRUE

Syntax:

SET TIMEZONE=time zone

time zone can be a canonical code or a time zone name defined in IANA Time Zone Database. The value is case-sensitive.

Leading/trailing quotes (') and double quotes (") are stripped.

If time zone is an empty string, the time zone for the query is set to the default time zone of the Impalad Coordinator.

If time zone is NULL or a space character, Impala returns an error when the query is executed.

Type: STRING

Default: The system time zone where the Coordinator impalad runs

Examples:

SET TIMEZONE=UTC;
SET TIMEZONE="Europe/Budapest";

Added in: Impala 3.1

TOPN_BYTES_LIMIT query option

The TOPN_BYTES_LIMIT query option places a limit on the amount of estimated memory that Impala can process for top-N queries.

Top-N queries are the queries that include both ORDER BY and LIMIT clauses. Top-N queries don't spill to disk so have to keep all rows they process in memory, and those queries can cause out-of-memory issues when running with a large limit and an offset. If the Impala planner estimates that a top-N operator will process more bytes than the TOPN_BYTES_LIMIT value, it will replace the top-N operator with the sort operator. Switching to the sort operator allows Impala to spill to disk, thus requiring less memory than top-N, but potentially with performance penalties.

The option has no effect when set to 0 or -1.

Syntax:

SET TOPN_BYTES_LIMIT=limit

Type: Number

Default: 536870912 (512 MB)

Added in: Impala 3.1

USE_DOP_FOR_COSTING query option

The Impala query planner chooses between different techniques for performing join queries, depending on the absolute and relative sizes of the tables. Broadcast distribution is the default where the right-hand table is considered to be smaller than the left-hand table, and its contents are sent to all the other nodes involved in the query. The alternative technique is known as a partitioned distribution, which is more suitable for large tables of roughly equal size. With this technique, portions of each table are sent to appropriate other nodes where those subsets of rows can be processed in parallel. In some cases the current planner may pick broadcast distribution even when the partition distribution will be more optimal.

For the planner to choose the correct distribution method for the best possible performance, 2 new query options have been added in this release.

use_dop_for_costing (Type:BOOLEAN, Default:true)
broadcast_to_partition_factor (Type:DOUBLE, Default:1.0)

When the query option use_dop_for_costing is enabled, the planner incorporates the join operator's degree of parallelism (dop) and broadcast-to-partition factor in the costing of the build side of a join when comparing broadcast vs partition distribution thereby increasing the cost of the broadcast join's build side. The broadcast cost is now dependent on m and C, where:

m = degree of parallelism of the join node and,

C = the broadcast_to_partition_factor

This allows the planner to consider partition distribution where appropriate.

USE_LOCAL_TZ_FOR_UNIX_TIMESTAMP_CONVERSIONS query option

By default, Impala stores and interprets TIMESTAMP values in UTC time zone when writing to data files, reading from data files, or converting to and from system time values through functions.

When you set the use_local_tz_for_unix_timestamp_conversions query option to TRUE, Impala treats the TIMESTAMP values specified in the local time zone. The local time zone is determined in the following order with the TIMESTAMP query option takes the highest precedence:

The TIMESTAMP query option
$TZ environment variable
System time zone where the impalad coordinator runs

The use_local_tz_for_unix_timestamp_conversions query option can be used to fix discrepancy in INTERVAL operations. For example, a TIMESTAMP + INTERVAL n-hours can be affected by Daylight Saving Time, which Impala does not consider by default as these operations are applied as if the timestamp was in UTC. You can use the use_local_tz_for_unix_timestamp_conversions option to fix the issue.

Type: BOOLEAN

UTF8_MODE query option

UTF-8 support allows string functions to recognize the UTF-8 characters, thus processing strings in a compatible way as other engines.

You can use the new query option, UTF8_MODE, to turn on/off the UTF-8 aware behavior. The query option can be set globally, or at per session level. Only queries with UTF8_MODE=true will have UTF-8 aware behaviors. If the query option UTF8_MODE is turned on globally, existing queries that depend on the original binary behavior need to be explicitly set to UTF8_MODE=false.

Type:BOOLEAN

Default:FALSE

Added in:Impala 4.1