Query options
Specify query options in the SET
statement to
apply the settings to the subsequently issued queries.
Some query options are useful in day-to-day operations for improving usability, performance, or flexibility.
Other query options control special-purpose aspects of Impala operation and are intended primarily for advanced debugging or troubleshooting.
Options with Boolean parameters can be set to 1 or
true
to enable, or 0 or false
to
turn off.
You can set query options directly through the JDBC and
ODBC interfaces by using the SET
statement. Formerly,
SET
was only available as a command within the
impala-shell interpreter.
You can set query options for an
impala-shell session by specifying one or more
command-line arguments of the form
--query_option=option=value
.
- ABORT_ON_ERROR
- AGG_MEM_CORRELATION_FACTOR
- ALLOW_ERASURE_CODED_FILES
- APPX_COUNT_DISTINCT
- BATCH_SIZE
- BROADCAST_BYTES_LIMIT
- BUFFER_POOL_LIMIT
- COMPRESSION_CODEC
- COMPUTE_STATS_MIN_SAMPLE_SIZE
- DEBUG_ACTION
- DECIMAL_V2
- DEFAULT_FILE_FORMAT
- DEFAULT_HINTS_INSERT_STATEMENT
- DEFAULT_JOIN_DISTRIBUTION_MODE
- DEFAULT_NDV_SCALE
- DEFAULT_SPILLABLE_BUFFER_SIZE
- DEFAULT_TRANSACTIONAL_TYPE
- DELETE_STATS_IN_TRUNCATE
- DISABLE_CODEGEN
- DISABLE_CODEGEN_ROWS_THRESHOLD
- DISABLE_HBASE_NUM_ROWS_ESTIMATE
- DISABLE_ROW_RUNTIME_FILTERING
- DISABLE_STREAMING_PREAGGREGATIONS
- DISABLE_UNSAFE_SPILLS
- ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION
- ENABLE_EXPR_REWRITES
- EXEC_SINGLE_NODE_ROWS_THRESHOLD
- EXEC_TIME_LIMIT_S
- EXPAND_COMPLEX_TYPES
- EXPLAIN_LEVEL
- FETCH_ROWS_TIMEOUT_MS
- HBASE_CACHE_BLOCKS
- HBASE_CACHING
- IDLE_SESSION_TIMEOUT
- JOIN_SELECTIVITY_CORRELATION_FACTOR
- KUDU_READ_MODE
- KUDU_REPLICA_SELECTION
- LARGE_AGG_MEM_THRESHOLD
- LIVE_PROGRESS
- LIVE_SUMMARY
- MAX_CNF_EXPRS
- MAX_ERRORS
- MAX_FRAGMENT_INSTANCES_PER_NODE
- MAX_MEM_ESTIMATE_FOR_ADMISSION
- MAX_NUM_RUNTIME_FILTERS
- MAX_RESULT_SPOOLING_MEM
- MAX_ROW_SIZE
- MAX_SCAN_RANGE_LENGTH
- MAX_SPILLED_RESULT_SPOOLING_MEM
- MEM_LIMIT
- MEM_LIMIT_COORDINATORS
- MIN_SPILLABLE_BUFFER_SIZE
- MT_DOP
- NUM_NODES
- NUM_ROWS_PRODUCED_LIMIT
- NUM_SCANNER_THREADS
- OPTIMIZE_SIMPLE_LIMIT
- OPTIMIZE_PARTITION_KEY_SCANS
- ORC_SCHEMA_RESOLUTION
- PARQUET_COMPRESSION_CODEC
- PARQUET_ANNOTATE_STRINGS_UTF8
- PARQUET_ARRAY_RESOLUTION
- PARQUET_DICTIONARY_FILTERING
- PARQUET_FALLBACK_SCHEMA_RESOLUTION
- PARQUET_FILE_SIZE
- PARQUET_OBJECT_STORE_SPLIT_SIZE
- PARQUET_PAGE_ROW_COUNT_LIMIT
- PARQUET_READ_PAGE_INDEX
- PARQUET_READ_STATISTICS
- PARQUET_WRITE_PAGE_INDEX
- PREFETCH_MODE
- QUERY_TIMEOUT_S
- REPLICA_PREFERENCE
- REQUEST_POOL
- RESOURCE_TRACE_RATIO
- RETRY_FAILED_QUERIES
- RUNTIME_BLOOM_FILTER_SIZE
- RUNTIME_FILTER_MAX_SIZE
- RUNTIME_FILTER_MIN_SIZE
- RUNTIME_FILTER_MODE
- RUNTIME_FILTER_WAIT_TIME_MS
- S3_SKIP_INSERT_STAGING
- SCAN_BYTES_LIMIT
- SCHEDULE_RANDOM_REPLICA
- SCRATCH_LIMIT
- SHUFFLE_DISTINCT_EXPRS
- SPOOL_QUERY_RESULTS
- SPOOL_ALL_RESULTS_FOR_RETRIES
- SUPPORT_START_OVER
- SYNC_DDL
- THREAD_RESERVATION_AGGREGATE_LIMIT
- THREAD_RESERVATION_LIMIT
- TIMEZONE
- TOPN_BYTES_LIMIT
- ABORT_ON_ERROR query option
-
When this option is enabled, Impala cancels a query immediately when any of the nodes encounters an error, rather than continuing and possibly returning incomplete results. This option is disabled by default, to help gather maximum diagnostic information when an error occurs, for example, whether the same problem occurred on all nodes or only a single node. Currently, the errors that Impala can skip over involve data corruption, such as a column that contains a string value when expected to contain an integer value.
To control how much logging Impala does for non-fatal errors when
ABORT_ON_ERROR
is turned off, use theMAX_ERRORS
option.Type:
BOOLEAN
Default:
FALSE
(0
) - AGG_MEM_CORRELATION_FACTOR and LARGE_AGG_MEM_THRESHOLD query options
-
These query options are an improvement in memory estimation for aggregation node and do not replace the existing query option PREAGG_BYTES_LIMIT.
Prior to this release, you used the query option PREAGG_BYTES_LIMIT to control memory usage of preagg node. But it has to be manually tweaked to be more conservative with memory and to directly limit the memory consumption of the preaggregations. When PREAGG_BYTES_LIMIT is not set, the memory estimation of each streaming preaggregation node in query plan can be expanded to consume nearly all of the memory reservation on a host, starving the merge aggregation or possibly other streaming aggregation. This is because the Planner often overestimates aggregation node memory estimates since it uses simple multiplication of NDVs of contributing grouping columns.
To reduce this large memory estimation issue, this patch includes two new query options AGG_MEM_CORRELATION_FACTOR and LARGE_AGG_MEM_THRESHOLD to the existing query option to control memory estimation. If the estimated memory from the default NDV multiplication method exceed LARGE_AGG_MEM_THRESHOLD, recompute the estimated memory again by comparing against the max(NDV) & AGG_MEM_CORRELATION_FACTOR method.
Default values:
LARGE_AGG_MEM_THRESHOLD = 512 MB
AGG_MEM_CORRELATION_FACTOR = 0.5 - ALLOW_ERASURE_CODED_FILES query option
-
Use the
ALLOW_ERASURE_CODED_FILES
query option to enable or disable the support of erasure coded files in Impala. Until Impala is fully tested and certified with erasure coded files, this query option is set toFALSE
by default.When the
ALLOW_ERASURE_CODED_FILES
query option is set toFALSE
, Impala returns an error when a query requires scanning an erasure coded file.Type:
BOOLEAN
Default:
FALSE
(0
)Added in: Impala 3.1
- APPX_COUNT_DISTINCT query option
-
When the
APPX_COUNT_DISTINCT
query option is set toTRUE
, Impala implicitly convertsCOUNT(DISTINCT)
operations to theNDV()
function calls. The resulting count is approximate rather than precise. Enable the query option when a tolerable amount of error is acceptable in order to obtain faster query results than with aCOUNT (DISTINCT)
queries.Type:
BOOLEAN
Default:
FALSE
(0
) - BATCH_SIZE query option
-
Number of rows evaluated at a time by SQL operators. Unspecified or a size of 0 uses a predefined default size. Using a large number improves responsiveness, especially for scan operations, at the cost of a higher memory footprint.
This option is primarily for testing during Impala development, or for use under the direction of Cloudera support.
Type: Number
Default:
0
(meaning the predefined default of 1024) - BROADCAST_BYTES_LIMIT query option
-
Sets the limit for the size of the broadcast input based on estimated size. The Impala planner may in rare cases make a bad choice to broadcast a large table or intermediate result and encounter performance problems due to high memory pressure. Setting this limit will make the planner pick a partition based hash join instead of broadcast and avoid such performance problems.
Type:
INT
Default: The default value is 34359738368 (32 GB). A value of 0 causes the option to be ignored.
Usage notes:
Set this option in bytes. For example: -- Change the limit to 16GB. set broadcast_bytes_limit=17179869184; -- Disable the BROADCAST_BYTES_LIMIT. set broadcast_bytes_limit=0;
- BUFFER_POOL_LIMIT query option
-
Defines a limit on the amount of memory that a query can allocate from the internal buffer pool. The value for this limit applies to the memory on each host, not the aggregate memory across the cluster.
Typically not changed by users, except during diagnosis of out-of-memory errors during queries.
You can set it to an absolute value, e.g.
8GB
, or a relative value, e.g.80%
, based on theMEM_LIMIT
setting.Type:
INT
Default: The default setting for this option is the lower of 80% of the
MEM_LIMIT
setting, or theMEM_LIMIT
setting minus 100 MB.Usage notes:
If queries encounter out-of-memory errors, consider decreasing the
BUFFER_POOL_LIMIT
setting to less than 80% of theMEM_LIMIT setting
. - COMPRESSION_CODEC query option
-
When Impala writes Parquet data files and other supported data files using the
INSERT
statement, the underlying compression is controlled by theCOMPRESSION_CODEC
query option.For the ZSTD compression, the optional compression level can be specified as shown in the syntax below.
Syntax:
SET COMPRESSION_CODEC=codec_name; // Supported for all codecs. SET COMPRESSION_CODEC=codec_name:compression_level; // Only supported for ZSTD.
The allowed values for this query option are
SNAPPY
(the default),GZIP
,ZSTD
,LZ4
, andNONE
.ZSTD
also supports setting a compression level. The lower the level, the faster the speed at the cost of compression ratio. Compression levels from 1 up to 22 are supported forZSTD
. The default compression level 3 is used, if one is not passed using thecompression_codec
query option.The option value is not case-sensitive.
If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option setting, not just queries involving Parquet tables. (The value
BZIP2
is also recognized, but is not compatible with Parquet tables.)Type:
STRING
Default:
SNAPPY
- COMPUTE_STATS_MIN_SAMPLE_SIZE query option
-
The
COMPUTE_STATS_MIN_SAMPLE_SIZE
query option specifies the minimum number of bytes that will be scanned inCOMPUTE STATS TABLESAMPLE
, regardless of the user-supplied sampling percent. This query option prevents sampling for very small tables where accurate stats can be obtained cheaply without sampling because the minimum sample size is required to get meaningful stats.Type:
INT
Default:
1GB
- DEBUG_ACTION query option
-
Introduces artificial problem conditions within queries. For internal Cloudera debugging and troubleshooting.
Type:
STRING
Default: An empty string
- DECIMAL_V2 query option
-
A query option that changes behavior related to the
DECIMAL
data type. Set this option toFALSE
for backward compatibility to Impala 2.x.Type: Boolean
Default:
TRUE
- DEFAULT_FILE_FORMAT query option
-
Use the
DEFAULT_FILE_FORMAT
query option to set the default table file format. The following values are supported:-
TEXT
(0
) -
RC_FILE
(1
) -
SEQUENCE_FILE
(2
) -
AVRO
(3
) -
PARQUET
(4
) -
KUDU
(5
) -
ORC
(6
)
In impala-shell or Hue, the
SET DEFAULT_FILE_FORMAT
statement will not return an error when the option was set to an unsupported value. Impala validates the value and returns an error when you submitted the next query.Type: Enum as specified above
Default:
TEXT
(0
)Added in: Impala 3.3
-
- DEFAULT_HINTS_INSERT_STATEMENT query option
-
The
DEFAULT_HINTS_INSERT_STATEMENT
query option sets the default hints for theINSERT
statements with no optimizer hint specified.When there are hints specified in an
INSERT
statement, these default hints are ignored.You can specify multiple hints separated by '
:
'.For example:
SET DEFAULT_HINTS_INSERT_STATEMENT=CLUSTERED; SET DEFAULT_HINTS_INSERT_STATEMENT=SHUFFLE; SET DEFAULT_HINTS_INSERT_STATEMENT=NOCLUSTERED:NOSHUFFLE;
The default hints apply to the HDFS and Kudu table formats and are ignored for the HBase table format.
Type:
STRING
Default: No default hint
Added in: Impala 3.3
- DEFAULT_JOIN_DISTRIBUTION_MODE query option
-
Impala optimizes join queries based on the presence of table statistics, which are produced by the Impala
COMPUTE STATS
statement. By default, when a table involved in the join query does not have statistics, Impala uses thebroadcast
technique that transmits the entire contents of the table to all executor nodes participating in the query. If one table involved in a join has statistics and the other does not, the table without statistics is broadcast. If both tables are missing statistics, the table on the right-hand side of the join is broadcast. This behavior is appropriate when the table involved is relatively small, but can lead to excessive network, memory, and CPU overhead if the table being broadcast is large.Because Impala queries frequently involve very large tables, and suboptimal joins for such tables could result in spilling or out-of-memory errors, the setting
DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE
lets you override the default behavior. The shuffle join mechanism divides the corresponding rows of each table involved in a join query using a hashing algorithm, and transmits subsets of the rows to other nodes for processing. Typically, this kind of join is more efficient for joins between large tables of similar size.The setting
DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE
is recommended when setting up and deploying new clusters, because it is less likely to result in serious consequences such as spilling or out-of-memory errors if the query plan is based on incomplete information. This setting is not the default, to avoid changing the performance characteristics of join queries for clusters that are already tuned for their existing workloads.The allowed values are
BROADCAST
(0) orSHUFFLE
(1).Type:
INT
Examples:
The following examples demonstrate appropriate scenarios for each setting of this query option.
-- Create a billion-row table. create table big_table stored as parquet as select * from huge_table limit 1e9; -- For a big table with no statistics, the -- shuffle join mechanism is appropriate. set default_join_distribution_mode=shuffle; ...join queries involving the big table...
-- Create a hundred-row table. create table tiny_table stored as parquet as select * from huge_table limit 100; -- For a tiny table with no statistics, the -- broadcast join mechanism is appropriate. set default_join_distribution_mode=broadcast; ...join queries involving the tiny table...
compute stats tiny_table; compute stats big_table; -- Once the stats are computed, the query option has -- no effect on join queries involving these tables. -- Impala can determine the absolute and relative sizes -- of each side of the join query by examining the -- row size, cardinality, and so on of each table. ...join queries involving both of these tables...
- DEFAULT_NDV_SCALE query option
-
Improving the precision of NDV()
As a cluster admin, you can improve the precision of NDV() using this new query option
DEFAULT_NDV_SCALE
. You may adjust the value in this query option to change the default precision setting for NDV() so that the SQL writers do not have to rewrite the SQLs to adjust NDV’s precision. This new option allows the cluster admin to adjust precision for each Admission Control queue according to cluster's resource usage.Controlling the NDV scale
Until this release, the scale used in NDV(<expr>) functions was fixed at 2. Now the scale can be provided using the newly added query option. This option will not influence the NDV scale for SQL function NDV(<expr>, <scale>) in which the NDV scale is provided by the 2nd argument <scale>
Type:
Number
Allowed value: Range [ 1 .. 10]
Default: 2
Example:
The following is an example of a typical workload with different NDV scales and with a cardinality of 40,090,650.
SET DEFAULT_NDV_SCALE = 10; SET DEFAULT_NDV_SCALE = 5; SET DEFAULT_NDV_SCALE = 2;
Metric Count Distinct NDV2 NDV5 NDV10 Memory(GB) 3.83 1.84 1.85 1.89 Duration(s) 182.89 30.22 29.72 29.24 ErrorRate 0% 1.8% 1.17% 0.06%
- DEFAULT_SPILLABLE_BUFFER_SIZE query option
-
Specifies the default size for a memory buffer used when the spill-to-disk mechanism is activated, for example for queries against a large table with no statistics, or large join operations.
Type:
INT
Default:
2097152
(2 MB)Unit: A numeric argument represents a size in bytes; you can also use a suffix of
m
ormb
for megabytes, org
orgb
for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.Usage notes:
This query option sets an upper bound on the size of the internal buffer size that can be used during spill-to-disk operations. The actual size of the buffer is chosen by the query planner.
If overall query performance is limited by the time needed for spilling, consider increasing the
DEFAULT_SPILLABLE_BUFFER_SIZE
setting. Larger buffer sizes result in Impala issuing larger I/O requests to storage devices, which might result in higher throughput, particularly on rotational disks.The tradeoff with a large value for this setting is increased memory usage during spill-to-disk operations. Reducing this value may reduce memory consumption.
To determine if the value for this setting is having an effect by capping the spillable buffer size, you can see the buffer size chosen by the query planner for a particular query.
EXPLAIN
the query while the settingEXPLAIN_LEVEL=2
is in effect. - DEFAULT_TRANSACTIONAL_TYPE query option
-
Use the
DEFAULT_TRANSACTIONAL_TYPE
query option to create insert-only transactional tables by default.The supported values are:-
NONE
: The table will not be created as transactional. -
INSERT_ONLY
: The table will be created as transactional.
If either table properties,
transactional
ortransactional_properties
, are set, this query option is ignored.Type: Enum as specified above
Default:
NONE
Added in: Impala 3.3
-
- DELETE_STATS_IN_TRUNCATE query option
-
This query option
DELETE_STATS_IN_TRUNCATE
can be used to delete or retain table statistics. The default value of this option is 1 or true which means table statistics will be deleted as part of truncate operation. However the default may not be an ideal solution in case of non-transactional tables when the table and column statistics for the table are also deleted by default. This can be an expensive operation especially when many truncate table commands are running concurrently. In cases where truncate operation is used to remove the existing data and then reload new data, it is likely that you will compute stats again as soon as the new data is reloaded. This would overwrite the existing statistics and hence the additional time spent by the truncate operation to delete column and table statistics becomes obsolete. To avoid this expensive operation and to improve performance, you can set this query option to false or 0. When set to false, a truncate operation will not delete the table and column statistics for the table.Type:
BOOLEAN
; recognized values are 1 and 0, or true and false;Default:
TRUE
(shown as 1 in output of SET statement). - DISABLE_CODEGEN query option
-
The
DISABLE_CODEGEN
is a debug option, and it's used to work around any issues with Impala's runtime code generation. If a query fails with anillegal instruction
or other hardware-specific message, try settingDISABLE_CODEGEN=true
and running the query again. If the query succeeds only when theDISABLE_CODEGEN
option is turned on, submit the problem to Cloudera support and include that detail in the problem report.Most queries will run significantly slower with
DISABLE_CODEGEN=true
.In Impala 2.10 and higher, the
DISABLE_CODEGEN_ROWS_THRESHOLD
optimisation automatically disables codegen for small queries because short-running queries may run faster without the overhead of codegen.The following values are supported:-
TRUE
or1
: Disables codegen. -
FALSE
or0
: Enables codegen.
Type:
BOOLEAN
Default:
FALSE
(0
) -
- DISABLE_CODEGEN_ROWS_THRESHOLD query option
-
Specify the number of rows processed per Impala daemon which is the cutoff point below which Impala disables native code generation for the whole query. Native code generation is beneficial for queries that process many rows because it reduces the time taken to process of each row. However, generating the native code adds latency to query startup. Therefore, automatically disabling codegen for queries that process relatively small amounts of data can improve query response time.
This setting applies to queries where the number of rows processed can be accurately determined, either through table and column statistics, or by the presence of a
LIMIT
clause. If Impala cannot accurately estimate the number of rows, then this setting does not apply.If a query uses the complex data types
STRUCT
,ARRAY
, orMAP
, then codegen is never automatically disabled regardless of theDISABLE_CODEGEN_ROWS_THRESHOLD
setting.Type: Number
Default:
50000
Usage notes: Typically, you increase the default value to make this optimization apply to more queries. If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to queries that actually involve substantial work, you might see the queries being slower as a result of codegen being disabled. In that case, recompute statistics with the
COMPUTE STATS
orCOMPUTE INCREMENTAL STATS
statement. If there is a problem collecting accurate statistics, you can turn this feature off by setting the value to 0. - DISABLE_HBASE_NUM_ROWS_ESTIMATE
-
Use the
DISABLE_HBASE_NUM_ROWS_ESTIMATE
query option to disable key sampling of HBase tables in row count and row size estimation.While generating a plan for an HBase query, the planner samples the underlying HBase tables to estimate their row count and row size, and the sampling can negatively impact the planning time. When the HBase table stats do not change much in a short time, disable the sampling by setting the
DISABLE_HBASE_NUM_ROWS_ESTIMATE
query option toTRUE
. And Impala planner will fall back to using Hive Metastore (HMS) table stats instead.When
DISABLE_HBASE_NUM_ROWS_ESTIMATE
query option is set toTRUE
, you need to update the HMS table stats by runningCOMPUTE STATS
. Alternatively, you can manually set table statistics by runningALTER TABLE
.The following values are supported:-
TRUE
or1
: Disables the normal key sampling of HBase tables and uses HMS table stats for estimation. -
FALSE
or0
: Enables the normal sampling of HBase tables.
Type:
BOOLEAN
Default:
FALSE
-
- DISABLE_ROW_RUNTIME_FILTERING query
-
The
DISABLE_ROW_RUNTIME_FILTERING
query option reduces the scope of the runtime filtering feature. Queries still dynamically prune partitions, but do not apply the filtering logic to individual rows within partitions.Only applies to queries against HDFS-based tables using the Parquet file format. For other file formats, Impala only prunes at the level of partitions, not individual rows.
This query option only applies to queries against
Type:
BOOLEAN
Default:
FALSE
(0
)Usage notes:
Impala automatically evaluates whether the per-row filters are being effective at reducing the amount of intermediate data. Therefore, this option is typically only needed for the rare case where Impala cannot accurately determine how effective the per-row filtering is for a query.
Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.
Kudu consideration:
When applied to a query involving a Kudu table, this option turns off all runtime filtering for the Kudu table.
- DISABLE_STREAMING_PREAGGREGATIONS query option
-
Turns off the
streaming preaggregation
optimization that is available in Impala 2.5 and higher. This optimization reduces unnecessary work performed by queries that perform aggregation operations on columns with few or no duplicate values, for exampleDISTINCT id_column
orGROUP BY unique_column
. If the optimization causes regressions in existing queries that use aggregation functions, you can turn it off as needed by setting this query option.Type:
BOOLEAN
Default:
FALSE
(0
)Usage notes:
Typically, queries that would require enabling this option involve very large numbers of aggregated values, such as a billion or more distinct keys being processed on each worker node.
- DISABLE_UNSAFE_SPILLS query option
-
Enable this option if you prefer to have queries fail when they exceed the Impala memory limit, rather than write temporary data to disk.
Queries that
spill
to disk typically complete successfully, when in earlier Impala releases they would have failed. However, queries with exorbitant memory requirements due to missing statistics or inefficient join clauses could become so slow as a result that you would rather have them cancelled automatically and reduce the memory usage through standard Impala tuning techniques.This option prevents only
unsafe
spill operations, meaning that one or more tables are missing statistics or the query does not include a hint to set the most efficient mechanism for a join orINSERT ... SELECT
into a partitioned table. These are the tables most likely to result in suboptimal execution plans that could cause unnecessary spilling. Therefore, leaving this option enabled is a good way to find tables on which to run theCOMPUTE STATS
statement.Type:
BOOLEAN
Default:
FALSE
(0
) - ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION query option
-
When this query option is enabled the query planner considers inserting a distinct aggregation on the inner side of the join based on whether that aggregation would reduce the number of rows by more than 75%. This optimization reduces the number of potential plan regressions.
- ENABLE_EXPR_REWRITES query option
-
The
ENABLE_EXPR_REWRITES
query option controls whether to enable or disable the query compile time optimizations. These optimizations rewrite the expression trees to a more compact and optimized form that helps avoid redundant expression evaluation at run time. Performance optimizations controlled by this query option include:- Constant folding
- Extracting common conjuncts from disjunctions
- Simplify conditionals with constant conditions
Set the option to
FALSE
or0
to disable the performance optimizations.Type:
BOOLEAN
Default:
TRUE
(1
) - EXEC_SINGLE_NODE_ROWS_THRESHOLD query option
-
Specify the number of rows scanned as the cutoff point below which Impala treats a query as a
small
query, turning off optimizations such as parallel execution and native code generation. The overhead for these optimizations is applicable for queries involving substantial amounts of data, but it makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries allows Impala to complete them more quickly, keeping admission control slots, CPU, memory, and so on available for resource-intensive queries.This setting applies to queries where the number of rows processed can be accurately determined, either through table and column statistics, or by the presence of a
LIMIT
clause. If Impala cannot accurately estimate the number of rows, then this setting does not apply.In Impala 2.3 and higher, where Impala supports the complex data types
STRUCT
,ARRAY
, andMAP
, if a query refers to any column of those types, the small-query optimization is turned off for that query regardless of theEXEC_SINGLE_NODE_ROWS_THRESHOLD
setting.For a query that is determined to be
small
, all work is performed on the coordinator node. This might result in some I/O being performed by remote reads. The savings from not distributing the query work and not generating native code are expected to outweigh any overhead from the remote reads.Type: Number
Default:
100
Usage notes: Typically, you increase the default value to make this optimization apply to more queries. If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to queries that actually involve substantial work, you might see the queries being slower as a result of remote reads. In that case, recompute statistics with the
COMPUTE STATS
orCOMPUTE INCREMENTAL STATS
statement. If there is a problem collecting accurate statistics, you can turn this feature off by setting the value to -1. - EXEC_TIME_LIMIT_S query option
-
The
EXEC_TIME_LIMIT_S
query option sets a time limit on query execution in seconds. If a query is still executing when time limit expires, it is automatically canceled. The option is intended to prevent runaway queries that run for much longer than intended.For example, an Impala administrator could set a default value of
EXEC_TIME_LIMIT_S=3600
for a resource pool to automatically stop queries that run for longer than one hour. Then, if a user accidentally runs a large query that executes for more than one hour, it will be automatically killed after the time limit expires to free up resources. Users can override the default value per query or per session if they do not want the defaultEXEC_TIME_LIMIT_S
value to apply to a specific query or a session.Type: Number
Default:
0
(no time limit ) - EXPAND_COMPLEX_TYPES Query Option
-
Older versions of Impala did not support complex types in the select list, and therefore a
SELECT *
statement only expanded to primitive types even when the table contained complex-typed columns. Support for complex types in the select list has since been added, but for backward compatibility the behaviour that*
expressions skip complex types has been kept as the default. If the EXPAND_COMPLEX_TYPES query option is set to true,*
expressions will include complex types as well.Type:BOOLEAN
Default:FALSE
Added in:Impala 4.2
- EXPLAIN_LEVEL query option
-
Controls the amount of detail provided in the output of the
EXPLAIN
statement. The basic output can help you identify high-level performance issues such as scanning a higher volume of data or more partitions than you expect. The higher levels of detail show how intermediate results flow between nodes and how different SQL operations such asORDER BY
,GROUP BY
, joins, andWHERE
clauses are implemented within a distributed query.Type:
STRING
orINT
Default:
1
Arguments:
The allowed range of numeric values for this option is 0 to 3:
-
0
orMINIMAL
: A barebones list, one line per operation. Primarily useful for checking the join order in very long queries where the regularEXPLAIN
output is too long to read easily. -
1
orSTANDARD
: The default level of detail, showing the logical way that work is split up for the distributed query. -
2
orEXTENDED
: Includes additional detail about how the query planner uses statistics in its decision-making process, to understand how a query could be tuned by gathering statistics, using query hints, adding or removing predicates, and so on. In Impala 3.2 and higher, the output also includes the analyzed query with the cast information in the output header, and the implicit cast info in the Predicate section. -
3
orVERBOSE
: The maximum level of detail, showing how work is split up within each node intoquery fragments
that are connected in a pipeline. This extra detail is primarily useful for low-level performance testing and tuning within Impala itself, rather than for rewriting the SQL code at the user level.
Changing the value of this option controls the amount of detail in the output of the
EXPLAIN
statement. The extended information from level 2 or 3 is especially useful during performance tuning, when you need to confirm whether the work for the query is distributed the way you expect, particularly for the most resource-intensive operations such as join queries against large tables, queries against tables with large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to check estimated resource usage when you use the admission control or resource management features.Usage notes:
Read the
EXPLAIN
output from bottom to top. The lowest lines represent the initial work of the query (scanning data files), the lines in the middle represent calculations done on each node and how intermediate results are transmitted from one node to another, and the topmost lines represent the final results being sent back to the coordinator node.The numbers in the left column are generated internally during the initial planning phase and do not represent the actual order of operations, so it is not significant if they appear out of order in the
EXPLAIN
output.At all
EXPLAIN
levels, the plan contains a warning if any tables in the query are missing statistics. Use theCOMPUTE STATS
statement to gather statistics for each table and suppress this warning. See Tables and column statistics for details about how the statistics help query performance.The
PROFILE
command in impala-shell always starts with an explain plan showing full detail, the same as withEXPLAIN_LEVEL=3
. After the explain plan comes the executive summary, the same output as produced by theSUMMARY
command in impala-shell. -
- FETCH_ROWS_TIMEOUT_MS query option
-
Use the
FETCH_ROWS_TIMEOUT_MS
query option to control how long Impala waits for query results when clients fetch rows.When this query option is set to
0
, fetch requests wait indefinitely.The timeout applies both when query result spooling is enabled and disabled:- When result spooling is disabled (
SPOOL_QUERY_RESULTS = FALSE
), the timeout controls how long a client waits for a single row batch to be produced by the coordinator. - When result spooling is enabled ( (
SPOOL_QUERY_RESULTS = TRUE
), a client can fetch multiple row batches at a time, so this timeout controls the total time a client waits for row batches to be produced.
The timeout also applies to fetch requests issued against queries in the 'RUNNING' state. A 'RUNNING' query has no rows available, so any fetch request will wait until the query transitions to the 'FINISHED' state and for it to fetch all requested rows. A query in the 'FINISHED' state means that the rows are available to be fetched.
Type:
INT
Default:
10000
(10 seconds)Added in: Impala 3.4
- When result spooling is disabled (
- HBASE_CACHE_BLOCKS query option
-
Setting this option is equivalent to calling the
setCacheBlocks
method of the class org.apache.hadoop.hbase.client.Scan in an HBase Java application. Helps to control the memory pressure on the HBase RegionServer, in conjunction with theHBASE_CACHING
query option.Type:
BOOLEAN
- HBASE_CACHING query option
-
Setting this option is equivalent to calling the
setCaching
method of the class org.apache.hadoop.hbase.client.Scan in an HBase Java application. Helps to control the memory pressure on the HBase RegionServer, in conjunction with theHBASE_CACHE_BLOCKS
query option.Type:
BOOLEAN
Default:
FALSE
(0
) - IDLE_SESSION_TIMEOUT query option
-
Specify the time in seconds after which an idle session is cancelled. A session is idle when no activity is occurring for any of the queries in that session, and the session has not started any new queries. Once a session is expired, you cannot issue any new query requests to it. The session remains open, but the only operation you can perform is to close it.
The
IDLE_SESSION_TIMEOUT
query option overrides the‑‑idle_session_timeout
startup option. See Setting timeout periods for daemons, queries, and sessions for the‑‑idle_session_timeout
startup option.The
IDLE_SESSION_TIMEOUT
query option allows JDBC/ODBC connections to set the session timeout as a query option with theSET
statement.Type: Number
Default:
0
When this option is set to default:- If
‑‑idle_session_timeout
is not set, the session never expires. - If
‑‑idle_session_timeout
is set, use that timeout value.
- If
- JOIN_SELECTIVITY_CORRELATION_FACTOR
-
Any conjuncts in an inner or outer join, similar to the following example, will be used only when no other equijoin conjunct is present.
SELECT * FROM t1 inner join (SELECT a2, MAX(b2) as max_b2 FROM t2 GROUP BY a2) s2 ON t1.a1 = s2.a2 AND t1.b1 = s2.max_b2
In this example, the stats for the second conjunct involving the MAX expression gets added to the 'otherEqJoinStats' list. These stats were being used only when no other equijoin conjunct (involving base columns) was present leading to over-estimation.
To mitigate this issue, an improvement is made to the cardinality estimate for such cases by considering the conjuncts in the 'otherEqJoinStats list' in combination with the equijoin conjuncts.
A new query option
join_selectivity_correlation_factor
which is a floating point number between 0 and 1 inclusive is introduced in this release. It defaults to 0.0 which preserves the existing behavior of using the Minimum selectivity of the conjuncts. Given multiple join conjuncts C1, C2 ....Cn having comparable selectivity, you may notice over-estimation by several orders of magnitude.Setting the above query option to a value higher than 0 first computes the product of the selectivities: sel(C1) * sel(C2) .. sel(Cn) and then scales it by dividing by the
join_selectivity_correlation_factor
.Note that this setting applies to all the joins in the query. Another approach is to use per-join hints, but there are limitations in the hints approach too.
Default:
"0.0"
- KUDU_READ_MODE query option
-
The
KUDU_READ_MODE
query option allows you to set a desired consistency level for scans of Kudu tables.The following values are supported for the query option:-
"DEFAULT"
: The value of the startup flag,--kudu_read_mode
, is used. -
"READ_LATEST"
: Commonly known as the Read Committed isolation mode, in this mode, Kudu provides no consistency guarantees for this mode, except that all returned rows were committed at some point. -
"READ_AT_SNAPSHOT"
: Kudu will take a snapshot of the current state of the data and perform the scan over the snapshot, possibly after briefly waiting for ongoing writes to complete. This provides "Read Your Writes" consistency within a single Impala session, except in the case of a Kudu leader change. See the Kudu documentation for more details.
Type: String
Default:
"DEFAULT"
Added in: Impala 3.1
-
- KUDU_REPLICA_SELECTION query option
-
With the location-aware placement policy for tablet replicas in Kudu, the replicas are spread across locations so that the failure of tablet servers in one location does not make tablets unavailable. With this advanced placement policy the replicas of a tablet on Kudu tablet servers are assigned with the role as a leader or a follower. By default when Impala attempts to read from the tablet server, it is targeted to the closest tablet server that is to choose "nearby" replicas on the tablet server for scanning. However using this new query option KUDU_REPLICA_SELECTION, the queries can be targeted to the leader-only replica and skip non-leader replicas while scanning. When the new query option KUDU_REPLICA_SELECTION is set as LEADER_ONLY, Impala planner will generate a query plan that enables Impala to scan Kudu tables at the leader-only replicas.
- LIVE_PROGRESS query option
-
When the
LIVE_PROGRESS
query option is set toTRUE
, Impala displays an interactive progress bar showing roughly what percentage of processing has been completed for queries submitted through the impala-shell command. When the query finishes, the progress bar is erased from the impala-shell console output.You can enable this query option within impala-shell by starting the shell with the
--live_progress
command-line option. You can still turn this setting off and on again within the shell through theSET
command.Starting in Impala 3.1, the summary output also includes the queuing status consisting of whether the query was queued and what was the latest queuing reason.
Type:
BOOLEAN
Default:
FALSE (0)
Usage notes:
The output from this query option is printed to standard error. The output is only displayed in interactive mode, that is, not when the
-q
or-f
options are used.Restrictions:
Because the percentage complete figure is calculated using the number of issued and completed
scan ranges
, which occur while reading the table data, the progress bar might reach 100% before the query is entirely finished. For example, the query might do work to perform aggregations after all the table data has been read. If many of your queries fall into this category, consider using theLIVE_SUMMARY
option instead for more granular progress reporting.The
LIVE_PROGRESS
andLIVE_SUMMARY
query options currently do not produce any output duringCOMPUTE STATS
operations.Because the
LIVE_PROGRESS
query option is available only within the impala-shell interpreter, you cannot change the query option through the SQLSET
statement using the JDBC or ODBC interfaces. TheSET
command in impala-shell recognizes the name as a shell-only option. - LIVE_SUMMARY query option
-
When the
LIVE_SUMMARY
query option is set toTRUE
, Impala displays the same output as theSUMMARY
command for queries submitted through the impala-shell command, with the measurements updated in real time as the query progresses. When the query finishes, the finalSUMMARY
output remains visible in the impala-shell console output.You can enable this query option within impala-shell by starting the shell with the
--live_summary
command-line option. You can still turn this setting off and on again within the shell through theSET
command.Starting in Impala 3.1, the summary output also includes the queuing status consisting of whether the query was queued and what was the latest queuing reason.
Type:
Boolean
Default:
FALSE (0)
Usage notes:
The output from this query option is printed to standard error. The output is only displayed in interactive mode, that is, not when the
-q
or-f
options are used.Only a single report is displayed at any one time, with each update overwriting the previous numbers.
The live summary output can be useful for evaluating long-running queries, to evaluate which phase of execution takes up the most time, or if some hosts take much longer than others for certain operations, dragging overall performance down. By making the information available in real time, this feature lets you decide what action to take even before you cancel a query that is taking much longer than normal.
For example, you might see the HDFS scan phase taking a long time, and therefore revisit performance-related aspects of your schema design such as constructing a partitioned table, switching to the Parquet file format, running the
COMPUTE STATS
statement for the table, and so on. Or you might see a wide variation between the average and maximum times for all hosts to perform some phase of the query, and therefore investigate if one particular host needed more memory or was experiencing a network problem.Restrictions:
The
LIVE_PROGRESS
andLIVE_SUMMARY
query options currently do not produce any output duringCOMPUTE STATS
operations.Because the
LIVE_SUMMARY
query option is available only within the impala-shell interpreter, you cannot change the query option through the SQLSET
statement using the JDBC or ODBC interfaces. TheSET
command in impala-shell recognizes the name as a shell-only option. - MAX_CNF_EXPRS query option
-
To optimize the performance of queries, the predicates are reorganized so that they can be applied earlier in the data processing. For example, some predicates can be pushed down to the scan node. This helps improve performance for some complex queries. However the CNF rewrite can lead to significant frontend memory usage and eventually OutOfMemory for a complex query that contains many predicates. To avoid this memory problem while maintaining performance improvement in basic circumstances and avoiding performance regressions on complex queries, this release lowered the default value of
MAX_CNF_EXPRS
from unlimited to 200. - MAX_ERRORS query option
-
Maximum number of non-fatal errors for any particular query that are recorded in the Impala log file. For example, if a billion-row table had a non-fatal data error in every row, you could diagnose the problem without all billion errors being logged. Unspecified or 0 indicates the built-in default value of 100.
This option only controls how many errors are reported. To specify whether Impala continues or halts when it encounters such errors, use the
ABORT_ON_ERROR
option.Type: Number
Default:
0
(meaning 100 errors) - MAX_FRAGMENT_INSTANCES_PER_NODE query option
-
When you executed a query with
COMPUTE_PROCESSING_COST=1
, Impala relied on theMT_DOP
option to decide the degree of parallelism of the scan fragment. Now scan node's processing cost is used as another factor to consider raising scan parallelism beyondMT_DOP
.Scan node cost now includes the number of effective scan ranges. Each scan range is given a weight of (0.5% * min_processing_per_thread), which roughly means that one scan node instance can handle at most 200 scan ranges. The new query option
MAX_FRAGMENT_INSTANCES_PER_NODE
is introduced to cap the maximum number of fragment instances per node. This newly introduced query option works in conjunction withPROCESSING_COST_MIN_THREADS
.Query option
MAX_FRAGMENT_INSTANCES_PER_NODE
is added as an upper bound on scan parallelism ifCOMPUTE_PROCESSING_COST
=true. If the number of scan ranges is fewer than the maximum parallelism allowed by the scan node's processing cost, then the processing cost will be clamped down to (min_processing_per_thread / number of scan ranges). LoweringMAX_FRAGMENT_INSTANCES_PER_NODE
can also clamp down the scan processing cost. For interior fragments, a combination ofMAX_FRAGMENT_INSTANCES_PER_NODE
,PROCESSING_COST_MIN_THREADS
, and the number of available cores per node is accounted for to determine maximum fragment parallelism per node. For the scan fragments, only the first two are considered to encourage frontend to choose a larger executor group as needed.Type:
INT
Default:
128
- MAX_MEM_ESTIMATE_FOR_ADMISSION query option
-
Use the
MAX_MEM_ESTIMATE_FOR_ADMISSION
query option to set an upper limit on the memory estimates of a query as a workaround for over-estimates precluding a query from being admitted.The query option takes effect when all of the below conditions are met:
- Memory-based admission control is enabled for the pool.
- The
MEM_LIMIT
query option is not set at the query, session, resource pool, or global level.
When the above conditions are met, MIN(
MAX_MEM_ESTIMATE_FOR_ADMISSION
, mem_estimate) is used for admission control.Setting the
MEM_LIMIT
query option is usually a better option. Use theMAX_MEM_ESTIMATE_FOR_ADMISSION
query option when it is not feasible to setMEM_LIMIT
for each individual query.Type:
INT
Added in: Impala 3.1
- MAX_NUM_RUNTIME_FILTERS query option
-
The
MAX_NUM_RUNTIME_FILTERS
query option sets an upper limit on the number of runtime filters that can be produced for each query.Type:
INT
Default:
10
Usage notes:
Each runtime filter imposes some memory overhead on the query. Depending on the setting of the
RUNTIME_BLOOM_FILTER_SIZE
query option, each filter might consume between 1 and 16 megabytes per plan fragment. There are typically 5 or fewer filters per plan fragment.Impala evaluates the effectiveness of each filter, and keeps the ones that eliminate the largest number of partitions or rows. Therefore, this setting can protect against potential problems due to excessive memory overhead for filter production, while still allowing a high level of optimization for suitable queries.
Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.
Kudu consideration:
This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.
- MAX_RESULT_SPOOLING_MEM query option
-
Use the
MAX_RESULT_SPOOLING_MEM
query option to set the maximum amount of memory used when spooling query results.If the amount of memory exceeds this value when spooling query results, all memory will most likely be spilled to disk.
The
MAX_RESULT_SPOOLING_MEM
query option is applicable only when query result spooling is enabled with theSPOOL_QUERY_RESULTS
query option set toTRUE
.Setting the option to
0
or-1
means the memory is unbounded.You cannot set this query option to values below
-1
.Type:
INT
Default:
100 * 1024 * 1024 (100 MB)
Added in: Impala 3.4
- MAX_ROW_SIZE query option
Ensures that Impala can process rows of at least the specified size. (Larger rows might be successfully processed, but that is not guaranteed.) Applies when constructing intermediate or final rows in the result set. This setting prevents out-of-control memory use when accessing columns containing huge strings.
A numeric argument represents a size in bytes; you can also use a suffix ofm
ormb
for megabytes, org
orgb
for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.Type:
INT
Default:
524288
(512 KB)Usage notes:
If a query fails because it involves rows with long strings and/or many columns, causing the total row size to exceed
MAX_ROW_SIZE
bytes, increase theMAX_ROW_SIZE
setting to accommodate the total bytes stored in the largest row. Examine the error messages for any failed queries to see the size of the row that caused the problem.Impala attempts to handle rows that exceed the
MAX_ROW_SIZE
value where practical, so in many cases, queries succeed despite having rows that are larger than this setting.Specifying a value that is substantially higher than actually needed can cause Impala to reserve more memory than is necessary to run the query.
In a Hadoop cluster with highly concurrent workloads and queries that process high volumes of data, traditional SQL tuning advice about minimizing wasted memory is worth remembering. For example, if a table has
STRING
columns where a single value might be multiple megabytes, make sure that theSELECT
lists in queries only refer to columns that are actually needed in the result set, instead of using theSELECT *
shorthand.- MAX_SCAN_RANGE_LENGTH query option
-
Maximum length of the scan range. Interacts with the number of HDFS blocks in the table to determine how many CPU cores across the cluster are involved with the processing for a query. (Each core processes one scan range.)
Lowering the value can sometimes increase parallelism if you have unused CPU capacity, but a too-small value can limit query performance because each scan range involves extra overhead.
Only applicable to HDFS tables. Has no effect on Parquet tables. Unspecified or 0 indicates backend default, which is the same as the HDFS block size for each table.
Although the scan range can be arbitrarily long, Impala internally uses an 8 MB read buffer so that it can query tables with huge block sizes without allocating equivalent blocks of memory.
The value can include unit specifiers, such as
100m
or100mb
.Type: Number
Default:
0
- MAX_SPILLED_RESULT_SPOOLING_MEM query option
-
Use the
MAX_SPILLED_RESULT_SPOOLING_MEM
query option to set the maximum amount of memory that can be spilled when spooling query results.If the amount of memory exceeds this value when spooling query results, the coordinator fragment will block until the client has consumed enough rows to free up more memory.
The
MAX_SPILLED_RESULT_SPOOLING_MEM
query option is applicable only when query result spooling is enabled with theSPOOL_QUERY_RESULTS
query option set toTRUE
.The value must be greater than or equal to the value of
MAX_RESULT_SPOOLING_MEM
.Setting the option to
0
or-1
means the memory is unbounded.Values below
-1
are not allowed for this query option.Type:
INT
Default:
1024 * 1024 * 1024 (1 GB)
Added in: Impala 3.4
- MEM_LIMIT query option
-
The
MEM_LIMIT
query option defines the maximum amount of memory a query can allocate on each node. The total memory that can be used by a query is theMEM_LIMIT
times the number of nodes.There are two levels of memory limit for Impala. The
‑‑mem_limit
startup option sets an overall limit for the impalad process (which handles multiple queries concurrently). That process memory limit can be expressed either as a percentage of RAM available to the process such as-mem_limit=70%
or as a fixed amount of memory, such as‑‑mem_limit=100gb
. The memory available to the process is based on the host's physical memory and memory limits from Linux Control Groups. For example, if an impalad process is running in a Docker container on a host with 100 GB of memory, the memory available is 100 GB or the Docker container's memory limit, whichever is less.The
MEM_LIMIT
query option, which you set through impala-shell or theSET
statement in a JDBC or ODBC application, applies to each individual query. TheMEM_LIMIT
query option is usually expressed as a fixed size such as10gb
, and must always be less than the impalad memory limit.If query processing approaches the specified memory limit on any node, either the per-query limit or the impalad limit, then the SQL operations will start to reduce their memory consumption, for example by writing the temporary data to disk (known as spilling to disk). The result is a query that completes successfully, rather than failing with an out-of-memory error. The tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back in. The slowdown could potentially be significant. Thus, while this feature improves reliability, you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
Type: Number
Units: A numeric argument represents memory size in bytes; you can also use a suffix of
m
ormb
for megabytes, or more commonlyg
orgb
for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.Default:
0
(unlimited)Usage notes:
The
MEM_LIMIT
setting is primarily useful for production workloads. Impala's Admission Controller can be configured to automatically assign memory limits to queries and limit memory consumption of resource pools.Use the output of the
SUMMARY
command in impala-shell to get a report of memory used for each phase of your most heavyweight queries on each node, and then set aMEM_LIMIT
somewhat higher than that. See for usage information about theSUMMARY
command.The following examples show how to set the
MEM_LIMIT
query option using a fixed number of bytes, or suffixes representing gigabytes or megabytes.[localhost:21000] > set mem_limit=3000000000; MEM_LIMIT set to 3000000000 [localhost:21000] > select 5; Query: select 5 +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3g; MEM_LIMIT set to 3g [localhost:21000] > select 5; Query: select 5 +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3gb; MEM_LIMIT set to 3gb [localhost:21000] > select 5; +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3m; MEM_LIMIT set to 3m [localhost:21000] > select 5; +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3mb; MEM_LIMIT set to 3mb [localhost:21000] > select 5; +---+ | 5 | +---+
The following examples show how unrecognized
MEM_LIMIT
values lead to errors for subsequent queries.[localhost:21000] > set mem_limit=3pb; MEM_LIMIT set to 3pb [localhost:21000] > select 5; ERROR: Failed to parse query memory limit from '3pb'. [localhost:21000] > set mem_limit=xyz; MEM_LIMIT set to xyz localhost:21000] > select 5; Query: select 5 ERROR: Failed to parse query memory limit from 'xyz'.
The following examples shows the automatic query cancellation when the
MEM_LIMIT
value is exceeded on any host involved in the Impala query. First it runs a successful query and checks the largest amount of memory used on any node for any stage of the query. Then it sets an artificially lowMEM_LIMIT
setting so that the same query cannot run.[localhost:21000] > select count(*) from customer; Query: select count(*) from customer +----------+ | count(*) | +----------+ | 150000 | +----------+ [localhost:21000] > select count(distinct c_name) from customer; Query: select count(distinct c_name) from customer +------------------------+ | count(distinct c_name) | +------------------------+ | 150000 | +------------------------+ [localhost:21000] > summary; +--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+ | Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+ | 06:AGGREGATE | 1 | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE | | 05:EXCHANGE | 1 | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED | | 02:AGGREGATE | 1 | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | | | 04:AGGREGATE | 1 | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | | | 03:EXCHANGE | 1 | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) | | 01:AGGREGATE | 1 | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | | | 00:SCAN HDFS | 1 | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer | +--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+ [localhost:21000] > set mem_limit=15mb; MEM_LIMIT set to 15mb [localhost:21000] > select count(distinct c_name) from customer; Query: select count(distinct c_name) from customer ERROR: Rejected query from pool default-pool: minimum memory reservation is greater than memory available to the query for buffer reservations. Memory reservation needed given the current plan: 38.00 MB. Adjust either the mem_limit or the pool config (max-query-mem-limit, min-query-mem-limit) for the query to allow the query memory limit to be at least 70.00 MB. Note that changing the mem_limit may also change the plan. See the query profile for more information about the per-node memory requirements.
- MEM_LIMIT_COORDINATORS query option
-
The new
MEM_LIMIT_COORDINATORS
query option functions similarly to theMEM_LIMIT
option but sets the query memory limit only on coordinators. This new option addresses the issue related toMEM_LIMIT
and is recommended in scenarios where the query needs higher or lower memory on coordinators compared to the planner estimates. - MIN_SPILLABLE_BUFFER_SIZE query option
-
Specifies the minimum size for a memory buffer used when the spill-to-disk mechanism is activated, for example for queries against a large table with no statistics, or large join operations.
Type:
INT
Default:
65536
(64 KB)Units: A numeric argument represents memory size in bytes; you can also use a suffix of
m
ormb
for megabytes, or more commonlyg
orgb
for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.Usage notes:
This query option sets a lower bound on the size of the internal buffer size that can be used during spill-to-disk operations. The actual size of the buffer is chosen by the query planner.
If overall query performance is limited by the time needed for spilling, consider increasing the
MIN_SPILLABLE_BUFFER_SIZE
setting. Larger buffer sizes result in Impala issuing larger I/O requests to storage devices, which might result in higher throughput, particularly on rotational disks.The tradeoff with a large value for this setting is increased memory usage during spill-to-disk operations. Reducing this value may reduce memory consumption.
To determine if the value for this setting is having an effect by capping the spillable buffer size, you can see the buffer size chosen by the query planner for a particular query.
EXPLAIN
the query while the settingEXPLAIN_LEVEL=2
is in effect. - MT_DOP query option
-
Sets the degree of parallelism used for certain operations that can benefit from multithreaded execution. You can specify values higher than zero to find the ideal balance of response time, memory usage, and CPU usage during statement processing.
Type:
INT
Default:
0
Because
COMPUTE STATS
andCOMPUTE INCREMENTAL STATS
statements for Parquet tables benefit substantially from extra intra-node parallelism, Impala automatically setsMT_DOP=4
when computing stats for Parquet tables.Range: 0 to 64
- Examples:
-
The following example shows how to run a
COMPUTE STATS
statement against a Parquet table with or without an explicitMT_DOP
setting:-- Explicitly setting MT_DOP to 0 selects the old code path. set mt_dop = 0; MT_DOP set to 0 -- The analysis for the billion rows is distributed among hosts, -- but uses only a single core on each host. compute stats billion_rows_parquet; +-----------------------------------------+ | summary | +-----------------------------------------+ | Updated 1 partition(s) and 2 column(s). | +-----------------------------------------+ drop stats billion_rows_parquet; -- Using 4 logical processors per host is faster. set mt_dop = 4; MT_DOP set to 4 compute stats billion_rows_parquet; +-----------------------------------------+ | summary | +-----------------------------------------+ | Updated 1 partition(s) and 2 column(s). | +-----------------------------------------+ drop stats billion_rows_parquet; -- Unsetting the option reverts back to its default. -- Which for COMPUTE STATS and a Parquet table is 4, -- so again it uses the fast path. unset MT_DOP; Unsetting option MT_DOP compute stats billion_rows_parquet; +-----------------------------------------+ | summary | +-----------------------------------------+ | Updated 1 partition(s) and 2 column(s). | +-----------------------------------------+
The following example shows the effects of setting
MT_DOP
for a query on a Parquet table:set mt_dop = 0; MT_DOP set to 0 -- COUNT(DISTINCT) for a unique column is CPU-intensive. select count(distinct id) from billion_rows_parquet; +--------------------+ | count(distinct id) | +--------------------+ | 1000000000 | +--------------------+ Fetched 1 row(s) in 67.20s set mt_dop = 16; MT_DOP set to 16 -- Introducing more intra-node parallelism for the aggregation -- speeds things up, and potentially reduces memory overhead by -- reducing the number of scanner threads. select count(distinct id) from billion_rows_parquet; +--------------------+ | count(distinct id) | +--------------------+ | 1000000000 | +--------------------+ Fetched 1 row(s) in 17.19s
The following example shows how queries that are not compatible with non-zero
MT_DOP
settings produce an error whenMT_DOP
is set:set mt_dop=1; MT_DOP set to 1 insert into a1 select * from a2; ERROR: NotImplementedException: MT_DOP not supported for DML statements.
Related Information:
- NUM_NODES query option
-
Limit the number of nodes that process a query, typically during debugging.
Type: Number
Allowed values: Only accepts the values 0 (meaning all nodes) or 1 (meaning all work is done on the coordinator node).
Default:
0
Usage notes:
If you are diagnosing a problem that you suspect is due to a timing issue due to distributed query processing, you can set
NUM_NODES=1
to verify if the problem still occurs when all the work is done on a single node.You might set the
NUM_NODES
option to 1 briefly, duringINSERT
orCREATE TABLE AS SELECT
statements. Normally, those statements produce one or more data files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a partitioned table, the default behavior could produce many small files when intuitively you might expect only a single output file.SET NUM_NODES=1
turns off thedistributed
aspect of the write operation, making it more likely to produce only one or a few data files. - NUM_ROWS_PRODUCED_LIMIT query option
-
The
NUM_ROWS_PRODUCED_LIMIT
query option limits the number of rows produced by a query. A query is canceled when its execution produces more rows than the specified limit set by theNUM_ROWS_PRODUCED_LIMIT
option.This limit only applies when the results are returned to a client, for example, to a
SELECT
query, but not to anINSERT
query.The default value of
0
specifies that there is no limit on the number of rows produced.Type:
INT
Allowed values: 0 or positive numbers
Default:
0
- NUM_SCANNER_THREADS query option
-
Maximum number of scanner threads (on each node) used for each query. By default, Impala uses as many cores as are available (one thread per core). You might lower this value if queries are using excessive resources on a busy cluster. Impala imposes a maximum value automatically, so a high value has no practical effect.
Type: Number
Default:
0
- OPTIMIZE_PARTITION_KEY_SCANS query option
-
Enables a fast code path for queries that apply simple aggregate functions to partition key columns:
MIN(key_column)
,MAX(key_column)
, orCOUNT(DISTINCT key_column)
.Type:
BOOLEAN
Default:
FALSE
(0
)Usage notes:
This optimization speeds up common
introspection
operations over partition key columns, for example determining the distinct values of partition keys.This optimization does not apply toSELECT
statements that reference columns that are not partition keys. It also only applies when all the partition key columns in theSELECT
statement are referenced in one of the following contexts:-
Within a
MAX()
orMAX()
aggregate function or as the argument of any aggregate function with theDISTINCT
keyword applied. -
Within a
WHERE
,GROUP BY
orHAVING
clause.
This optimization is enabled by a query option because it skips some consistency checks and therefore can return slightly different partition values if partitions are in the process of being added, dropped, or loaded outside of Impala. Queries might exhibit different behavior depending on the setting of this option in the following cases:
-
If files are removed from a partition using HDFS or other non-Impala operations, there is a period until the next
REFRESH
of the table where regular queries fail at run time because they detect the missing files. With this optimization enabled, queries that evaluate only the partition key column values (not the contents of the partition itself) succeed, and treat the partition as if it still exists. -
If a partition contains any data files, but the data files do not contain any rows, a regular query considers that the partition does not exist. With this optimization enabled, the partition is treated as if it exists.
If the partition includes no files at all, this optimization does not change the query behavior: the partition is considered to not exist whether or not this optimization is enabled.
-
- OPTIMIZE_SIMPLE_LIMIT query option
-
This new planner query option
OPTIMIZE_SIMPLE_LIMIT
optimizes the planning time for simple limit queries by only considering a minimal set of partitions whose file descriptors add up to N (the specified limit). Each file is conservatively estimated to contain 1 row.This reduces the number of partitions processed by the planner which, according to query profiling, has been the main contributor to the planning time especially for large numbers of partitions. Further, within each partition, only the number of non-empty files that brings the total to N will be considered. For the optimization to be considered, this option should be always set to "true".
This is an opt-in optimization and also applies to subqueries and view.
- ORC_SCHEMA_RESOLUTION query option
-
The ORC_SCHEMA_RESOLUTION query option allows Impala to look up columns within ORC files by column names, rather than by column indices, when necessary.
The allowed values are:
- POSITION (0)
- NAME (1)
Type: Enum of the above-allowed values
Usage notes:
By default, Impala looks up columns within an ORC file based on the order of columns in the table.
Impact of column renaming
Position-based resolution is resilient to column renaming however, a name-based resolution will fail when you rename a column. The following example demonstrates the impact of column renaming on position-based and name-based resolution.
- View the details of the table test_orc
select * from test_orc; +---------+------------+------------+ | int_col | string_col | double_col | +---------+------------+------------+ | 0 | A | 0.5 | | 1 | B | 1.5 | | 2 | C | 2.5 | +---------+------------+------------+
- Rename the column “double_col” to
“new_double_col”
alter table test_orc change double_col new_double_col double;
- Set resolving ORC file columns by
Index
set orc_schema_resolution=POSITION;
- View the
table
select int_col, string_col, new_double_col from test_orc; +---------+------------+----------------+ | int_col | string_col | new_double_col | +---------+------------+----------------+ | 0 | A | 0.5 | | 1 | B | 1.5 | | 2 | C | 2.5 | +---------+------------+----------------+
The above query results show that renaming a column did not impact when you resolved the ORC file columns by POSITION. However, if you resolve ORC file columns by NAME,set orc_schema_resolution=NAME
, viewing the table details will fail to fetch the renamed columns.select int_col, string_col, new_double_col from test_orc; +---------+------------+----------------+ | int_col | string_col | new_double_col | +---------+------------+----------------+ | 0 | A | NULL | | 1 | B | NULL | | 2 | C | NULL | +---------+------------+----------------+
- PARQUET_COMPRESSION_CODEC query option
-
Deprecated. Use
COMPRESSION_CODEC
in Impala 2.0 and later. - PARQUET_ANNOTATE_STRINGS_UTF8 query option
-
Causes Impala
INSERT
andCREATE TABLE AS SELECT
statements to write Parquet files that use the UTF-8 annotation forSTRING
columns.Type:
BOOLEAN
Default:
FALSE
(0
)Usage notes:
By default, Impala represents a
STRING
column in Parquet as an unannotated binary field.Impala always uses the UTF-8 annotation when writing
CHAR
andVARCHAR
columns to Parquet files. An alternative to using the query option is to castSTRING
values toVARCHAR
.This option is to help make Impala-written data more interoperable with other data processing engines. Impala itself currently does not support all operations on UTF-8 data. Although data processed by Impala is typically represented in ASCII, it is valid to designate the data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
- PARQUET_ARRAY_RESOLUTION query option
-
The
PARQUET_ARRAY_RESOLUTION
query option controls the behavior of the indexed-based resolution for nested arrays in Parquet.In Parquet, you can represent an array using a 2-level or 3-level representation. The modern, standard representation is 3-level. The legacy 2-level scheme is supported for compatibility with older Parquet files. However, there is no reliable metadata within Parquet files to indicate which encoding was used. It is even possible to have mixed encodings within the same file if there are multiple arrays. The
PARQUET_ARRAY_RESOLUTION
option controls the process of resolution that is to match every column/field reference from a query to a column in the Parquet file.The supported values for the query option are:
-
THREE_LEVEL
: Assumes arrays are encoded with the 3-level representation, and does not attempt the 2-level resolution. -
TWO_LEVEL
: Assumes arrays are encoded with the 2-level representation, and does not attempt the 3-level resolution. -
TWO_LEVEL_THEN_THREE_LEVEL
: First tries to resolve assuming a 2-level representation, and if unsuccessful, tries a 3-level representation.
All of the above options resolve arrays encoded with a single level.
A failure to resolve a column/field reference in a query with a given array resolution policy does not necessarily result in a warning or error returned by the query. A mismatch might be treated like a missing column (returns NULL values), and it is not possible to reliably distinguish the 'bad resolution' and 'legitimately missing column' cases.
The name-based policy generally does not have the problem of ambiguous array representations. You specify to use the name-based policy by setting the
PARQUET_FALLBACK_SCHEMA_RESOLUTION
query option toNAME
.Type: Enum of
TWO_LEVEL
,TWO_LEVEL_THEN_THREE_LEVEL
, andTHREE_LEVEL
Default:
THREE_LEVEL
Examples:
EXAMPLE A: The following Parquet schema of a file can be interpreted as a 2-level or 3-level:
ParquetSchemaExampleA { optional group single_element_groups (LIST) { repeated group single_element_group { required int64 count; } } }
The following table schema corresponds to a 2-level interpretation:
CREATE TABLE t (col1 array<struct<f1: bigint>>) STORED AS PARQUET;
Successful query with a 2-level interpretation:
SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL; SELECT ITEM.f1 FROM t.col1;
The following table schema corresponds to a 3-level interpretation:
CREATE TABLE t (col1 array<bigint>) STORED AS PARQUET;
Successful query with a 3-level interpretation:
SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL; SELECT ITEM FROM t.col1
EXAMPLE B: The following Parquet schema of a file can be only be successfully interpreted as a 2-level:
ParquetSchemaExampleB { required group list_of_ints (LIST) { repeated int32 list_of_ints_tuple; } }
The following table schema corresponds to a 2-level interpretation:
CREATE TABLE t (col1 array<int>) STORED AS PARQUET;
Successful query with a 2-level interpretation:
SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL; SELECT ITEM FROM t.col1
Unsuccessful query with a 3-level interpretation. The query returns
NULL
s as if the column was missing in the file:SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL; SELECT ITEM FROM t.col1
-
- PARQUET_DICTIONARY_FILTERING query option
-
The
PARQUET_DICTIONARY_FILTERING
query option controls whether Impala uses dictionary filtering for Parquet files.To efficiently process a highly selective scan query, when this option is enabled, Impala checks the values in the Parquet dictionary page and determines if the whole row group can be thrown out.
A column chunk is purely dictionary encoded and can be used by dictionary filtering if any of the following conditions are met:- If the
encoding_stats
is in the Parquet file, dictionary filtering uses it to determine if there are only dictionary encoded pages (i.e. there are no data pages with an encoding other than PLAIN_DICTIONARY). - If the encoding stats are not present, dictionary filtering looks at the
encodings. The column is purely dictionary encoded if both of the conditions
satisfy:
- PLAIN_DICTIONARY is present.
- Only PLAIN_DICTIONARY, RLE, or BIT_PACKED encodings are listed.
- Dictionary filtering works for the Parquet dictionaries with less than 40000 values if the file was written by Impala 2.9.
In the query runtime profile output for each Impalad instance, the
NumDictFilteredRowGroups
field in the SCAN node section shows the number of row groups that were skipped based on dictionary filtering.Note that row groups can be filtered out by Parquet statistics, and in such cases, dictionary filtering will not be considered.
The supported values for the query option are:-
TRUE
(1
): Use dictionary filtering. -
FALSE
(0
): Do not use dictionary filtering - Any other values are treated as
FALSE
.
Type:
BOOLEAN
Default:
TRUE
(1
) - If the
- PARQUET_FALLBACK_SCHEMA_RESOLUTION query option
-
The
PARQUET_FALLBACK_SCHEMA_RESOLUTION
query option allows Impala to look up columns within Parquet files by column name, rather than column order, when necessary. The allowed values are:-
POSITION
(0
) -
NAME
(1
)
Type: Enum of the above allowed values
Usage notes:
By default, Impala looks up columns within a Parquet file based on the order of columns in the table. The
name
setting for this option enables behavior for Impala queries similar to the Hive settingparquet.column.index access=false
. It also allows Impala to query Parquet files created by Hive with theparquet.column.index.access=false
setting in effect. -
- PARQUET_FILE_SIZE query option
-
Specifies the maximum size of each Parquet data file produced by Impala
INSERT
statements in bytes, or with a trailingm
org
character to indicate megabytes or gigabytes.With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB in Impala 2.0 and later) could be much larger than needed for each data file. For
INSERT
operations into such tables, you can increase parallelism by specifying a smallerPARQUET_FILE_SIZE
value, resulting in more HDFS blocks that can be processed by different nodes.Type: Number, with optional unit specifier
Default:
0
(produces files with a target size of 256 MB; files might be larger for very wide tables) - PARQUET_OBJECT_STORE_SPLIT_SIZE
-
Use the
PARQUET_OBJECT_STORE_SPLIT_SIZE
query option to control the Parquet split sizes on non-block based stores, e.g. S3, ADLS, etc.Type:
INT
Allowed values: The value must be greater than or equal to 1 MB.
Default: 256 MB
Added in: Impala 3.4 / CDP 1.2
- PARQUET_PAGE_ROW_COUNT_LIMIT query option
-
Use the
PARQUET_PAGE_ROW_COUNT_LIMIT
query option to set the maximum number of rows that can be written on a single Parquet data page. By default there is no row count limit.Type:
INT
Allowed values: Positive integers
Added in: Impala 3.3
- PARQUET_READ_PAGE_INDEX query option
-
Use the
PARQUET_READ_PAGE_INDEX
query option to disable or enable using the Parquet page index during scans. The page index contains min/max statistics at the page-level granularity. It can be used to skip pages and rows that do not match the conditions in theWHERE
clause.This option enables the same optimization as the
PARQUET_READ_STATISTICS
at the finer grained page level.Impala supports filtering based on Parquet statistics:
- Of the types:
BOOLEAN
,INT
,DECIMAL
,STRING
,TIMESTAMP
,DATE
- For simple predicates of the forms:
<slot> <op> <constant>
or<constant> <op> <slot>
, where<op>
is LT, LE, GE, GT, and EQ
The supported values for the query option are:-
TRUE
(1
): Read the page-level statistics from the Parquet page index during query processing and filter out pages based on the statistics. -
FALSE
(0
): Do not use the Parquet page index. - Any other values are treated as
false
.
Type:
BOOLEAN
Default:
TRUE
(1
) - Of the types:
- PARQUET_READ_STATISTICS query option
-
The
PARQUET_READ_STATISTICS
query option controls whether to read statistics from Parquet files and use them during query processing.Parquet stores min/max stats which can be used to skip reading row groups if they don't qualify a certain predicate. When this query option is set to
true
, Impala reads the Parquet statistics and skips reading row groups that do not match the conditions in theWHERE
clause.Impala supports filtering based on Parquet statistics:
- Of the numerical types for the old version of the statistics:
BOOLEAN
,INTEGER
,FLOAT
- Of the types for the new version of the statistics (starting in impala 2.8):
BOOLEAN
,INTEGER
,FLOAT
,DECIMAL
,STRING
,TIMESTAMP
,DATE
- For simple predicates of the forms:
<slot> <op> <constant>
or<constant> <op> <slot>
, where<op>
is LT, LE, GE, GT, and EQ
The
PARQUET_READ_STATISTICS
option provides a workaround when dealing with files that have corrupt Parquet statistics and unknown errors.In the query runtime profile output for each
Impalad
instance, theNumStatsFilteredRowGroups
field in the SCAN node section shows the number of row groups that were skipped based on Parquet statistics.The supported values for the query option are:-
TRUE
(1
): Read statistics from Parquet files and use them in query processing. -
FALSE
(0
): Do not use Parquet read statistics. - Any other values are treated as
FALSE
.
Type:
BOOLEAN
Default:
TRUE
- Of the numerical types for the old version of the statistics:
- PARQUET_WRITE_PAGE_INDEX query option
-
The
PARQUET_WRITE_PAGE_INDEX
query option disables or enables the Parquet page index writing.Impala writes page-level statistics into the Parquet page index of the types:
INT
,DECIMAL
,STRING
,TIMESTAMP
,DATE
The supported values for the query option are:-
TRUE
(1
): Write the Parquet page index when creating Parquet files. -
FALSE
(0
): Do not write the Parquet page index when creating Parquet files. - Any other values are treated as
false
.
Type:
BOOLEAN
Default:
TRUE
(1
)Added in: Impala 3.3
-
- PREFETCH_MODE query option
-
Determines whether the prefetching optimization is applied during join query processing.
Allowed values are:NONE
(0
)HT_BUCKET
(1
): Hash table buckets are prefetched during join query processing.
Type: Enum of the above allowed values
Default:
1
(HT_BUCKET
) - QUERY_TIMEOUT_S query option
-
Sets the idle query timeout value for the session, in seconds. Queries that sit idle for longer than the timeout value are automatically cancelled. If the system administrator specified the
--idle_query_timeout
startup option,QUERY_TIMEOUT_S
must be smaller than or equal to the--idle_query_timeout
value.Type: Number
Default:
0
(no timeout if--idle_query_timeout
not in effect; otherwise, use--idle_query_timeout
value) - REPLICA_PREFERENCE query option
-
The
REPLICA_PREFERENCE
query option lets you distribute the work more evenly if hotspots and bottlenecks persist. It causes the access cost of all replicas of a data block to be considered equal to or worse than the configured value. This allows Impala to schedule reads to suboptimal replicas (e.g. local in the presence of cached ones) in order to distribute the work across more executor nodes.Allowed values are:CACHE_LOCAL
(0
)DISK_LOCAL
(2
)REMOTE
(4
)
Type: Enum
Default:
CACHE_LOCAL (0)
Usage Notes:
By default Impala selects the best replica it can find in terms of access cost. The preferred order is cached, local, and remote. With
REPLICA_PREFERENCE
, the preference of all replicas are capped at the selected value. For example, whenREPLICA_PREFERENCE
is set toDISK_LOCAL
, cached and local replicas are treated with the equal preference. When set toREMOTE
, all three types of replicas, cached, local, remote, are treated with equal preference. - REQUEST_POOL query option
-
The pool or queue name that queries should be submitted to. Only applies when you enable the Impala admission control feature. Specifies the name of the pool used by requests from Impala to the resource manager.
Type:
STRING
Default: An empty string (use the user-to-pool mapping defined by an impalad startup option in the Impala configuration file)
- RESOURCE_TRACE_RATIO query option
-
The
RESOURCE_TRACE_RATIO
query option specifies the ratio of queries where the CPU usage info will be included in the profiles. Collecting CPU usage and sending it around adds a slight overhead during query execution. This query option lets you control whether to collect additional information to diagnose the resource usage.For example, setting
RESOURCE_TRACE_RATIO=1
adds a trace of the CPU usage to the profile of each query.Setting
RESOURCE_TRACE_RATIO=0.5
means that randomly selected half of all queries will have that information collected by the coordinator and included in the profiles.Setting
RESOURCE_TRACE_RATIO=0
means that CPU usage will not be tracked and included in the profiles.Values from 0 to 1 are allowed.
Type: Number
Default:
0
Added in: Impala 3.2
- RETRY_FAILED_QUERIES query option
-
Use the
RETRY_FAILED_QUERIES
query option to control whether or not queries are transparently retried on cluster membership changes.Cluster membership changes typically occur when an impalad crashes, or if the node is blacklisted by the Impala Coordinator. If a SELECT query fails due to a cluster membership change, the Coordinator will cancel and unregister the running query and then launch a retry of the query. For example, if one of the executor nodes fails during query execution, the query fails but is transparently re-executed, either with the executor node immediately replaced, or with a temporarily reduced number of executor nodes. This feature supports retrying the entire query and NOT the individual query fragments. INSERT and DDL queries will NOT be retried.
Type:
BOOLEAN
Default:
TRUE
- RUNTIME_BLOOM_FILTER_SIZE query option
-
Size (in bytes) of Bloom filter data structure used by the runtime filtering feature.
Type:
INTEGER
Default:
1048576
(1 MB)Maximum: 16 MB
Usage notes:
This setting affects optimizations for large and complex queries, such as dynamic partition pruning for partitioned tables, and join optimization for queries that join large tables. Larger filters are more effective at handling higher cardinality input sets, but consume more memory per filter.
If your query filters on high-cardinality columns (for example, millions of different values) and you do not get the expected speedup from the runtime filtering mechanism, consider doing some benchmarks with a higher value for
RUNTIME_BLOOM_FILTER_SIZE
. The extra memory devoted to the Bloom filter data structures can help make the filtering more accurate.Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.
Because the effectiveness of this setting depends so much on query characteristics and data distribution, you typically only use it for specific queries that need some extra tuning, and the ideal value depends on the query. Consider setting this query option immediately before the expensive query and unsetting it immediately afterward.
Kudu consideration:
This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.
- RUNTIME_FILTER_MAX_SIZE query option
-
The
RUNTIME_FILTER_MAX_SIZE
query option adjusts the settings for the runtime filtering feature. This option defines the maximum size for a filter, no matter what the estimates produced by the planner are. This value also overrides any lower number specified for theRUNTIME_BLOOM_FILTER_SIZE
query option. Filter sizes are rounded up to the nearest power of two.Type:
INTEGER
Default:
1048576
(1 MB)Usage notes:
Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.
Kudu consideration:
This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.
- RUNTIME_FILTER_MIN_SIZE query option
-
The
RUNTIME_FILTER_MIN_SIZE
query option adjusts the settings for the runtime filtering feature. This option defines the minimum size for a filter, no matter what the estimates produced by the planner are. This value also overrides any lower number specified for theRUNTIME_BLOOM_FILTER_SIZE
query option. Filter sizes are rounded up to the nearest power of two.Type:
INTEGER
Default:
0
(meaning use the value from the corresponding impalad startup option)Usage notes:
Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.
Kudu consideration:
This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.
- RUNTIME_FILTER_MODE query option
-
The
RUNTIME_FILTER_MODE
query option adjusts the settings for the runtime filtering feature. It turns this feature on and off, and controls how extensively the filters are transmitted between hosts.Allowed values are:OFF
(0
)LOCAL
(1
)GLOBAL
(2
)
Type: Enum of the above allowed values
Default:
GLOBAL
(2
)Usage notes:
The default is
GLOBAL
setting is recommended for a wide variety of workloads, to provide best performance without of the box
settings.The lowest setting of
LOCAL
does a similar level of optimization (such as partition pruning) as in earlier Impala releases. This setting was the default in Impala 2.5, to allow for a period of post-upgrade testing for existing workloads. This setting is suitable for workloads with non-performance-critical queries, or if the coordinator node is under heavy CPU or memory pressure.You might change the setting to
OFF
if your workload contains many queries involving partitioned tables or joins that do not experience a performance increase from the runtime filters feature. If the overhead of producing the runtime filters outweighs the performance benefit for queries, you can turn the feature off entirely. - RUNTIME_FILTER_WAIT_TIME_MS query option
-
The
RUNTIME_FILTER_WAIT_TIME_MS
query option adjusts the settings for the runtime filtering feature. It specifies a time in milliseconds that each scan node waits for runtime filters to be produced by other plan fragments.Type:
INTEGER
Default:
0
(meaning use the value from the corresponding impalad startup option)Usage notes:
Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables
- S3_SKIP_INSERT_STAGING query option
-
Speeds up
INSERT
operations on tables or partitions residing on the Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind if an error occurs partway through the operation.By default, Impala write operations to S3 tables and partitions involve a two-stage process. Impala writes intermediate files to S3, then (because S3 does not provide a
rename
operation) those intermediate files are copied to their final location, making the process more expensive as on a filesystem that supports renaming or moving files. This query option makes Impala skip the intermediate files, and instead write the new data directly to the final destination.Type:
BOOLEAN
Default:
TRUE
(1
)Usage notes:
The timing of file deletion during an
INSERT OVERWRITE
operation makes it impractical to write new files to S3 and delete the old files in a single operation. Therefore, this query option only affects regularINSERT
statements that add to the existing data in a table, notINSERT OVERWRITE
statements. UseTRUNCATE TABLE
if you need to remove all contents from an S3 table before performing a fastINSERT
with this option enabled.Performance improvements with this option enabled can be substantial. The speed increase might be more noticeable for non-partitioned tables than for partitioned tables.
- SCAN_BYTES_LIMIT query option
-
The
SCAN_BYTES_LIMIT
query option sets a limit on the bytes scanned by HDFS and HBase SCAN operations. If a query is still executing when the query’s coordinator detects that it has exceeded the limit, the query is terminated with an error. The option is intended to prevent runaway queries that scan more data than is intended.For example, an Impala administrator could set a default value of
SCAN_BYTES_LIMIT=100GB
for a resource pool to automatically stop queries that scan more than 100 GB of data (see Impala Admission Control and Query Queuing for information about default query options). If a user accidentally omits a partition filter in aWHERE
clause and runs a large query that scans a lot of data, the query will be automatically terminated after it scans more data than theSCAN_BYTES_LIMIT
.You can override the default value per-query or per-session, in the same way as other query options, if you do not want the defaultSCAN_BYTES_LIMIT
value to apply to a specific query or session.Because the checks are done periodically, the query may scan over the limit at times.
Type: Number
Units:- A numeric argument represents memory size in bytes.
- Specify a suffix of
m
ormb
for megabytes. - Specify a suffix of
g
orgb
for gigabytes. - If you specify a suffix with unrecognized formats, subsequent queries fail with an error.
Default:
0
(no limit)Added in: Impala 3.1
- SCHEDULE_RANDOM_REPLICA query option
-
The
SCHEDULE_RANDOM_REPLICA
query option fine-tunes the scheduling algorithm for deciding which host processes each HDFS data block or Kudu tablet to reduce the chance of CPU hotspots.By default, Impala estimates how much work each host has done for the query, and selects the host that has the lowest workload. This algorithm is intended to reduce CPU hotspots arising when the same host is selected to process multiple data blocks / tablets. Use the
SCHEDULE_RANDOM_REPLICA
query option if hotspots still arise for some combinations of queries and data layout.The
SCHEDULE_RANDOM_REPLICA
query option only applies to tables and partitions that are not enabled for the HDFS caching.Type:
BOOLEAN
Default:
FALSE
- SCRATCH_LIMIT query option
-
Specifies the maximum amount of disk storage, in bytes, that any Impala query can consume on any host using the
spill to disk
mechanism that handles queries that exceed the memory limit.Specify the size in bytes, or with a trailing
m
org
character to indicate megabytes or gigabytes.A value of zero turns off the spill to disk feature for queries in the current session, causing them to fail immediately if they exceed the memory limit.
The amount of memory used per host for a query is limited by the
MEM_LIMIT
query option.The more DataNodes in the cluster, the less memory is used on each host, and therefore also less scratch space is required for queries that exceed the memory limit.
Type: Number, with optional unit specifier
Default:
-1
(amount of spill space is unlimited) - SHUFFLE_DISTINCT_EXPRS query option
-
The
SHUFFLE_DISTINCT_EXPRS
query option controls the shuffling behavior when a query has both grouping and distinct expressions. Impala can optionally include the distinct expressions in the hash exchange to spread the data among more nodes. However, this plan requires one more hash exchange phase.It is recommended that you turn off this option if the NDVs of the grouping expressions are high.
Type:
BOOLEAN
Default:
FALSE
- SPOOL_QUERY_RESULTS query Option
-
Use the
SPOOL_QUERY_RESULTS
query option to enable query result spooling, which is disabled by default.Query result spooling controls how rows are returned to the client.- When query result spooling is disabled (
SPOOL_QUERY_RESULTS = FALSE
), Impala relies on clients to fetch results to trigger the generation of more result row batches until all the result rows have been produced. If a client issues a query without fetching all the results, the query fragments will continue to consume the resources until the query is cancelled and unregistered, potentially tying up resources and cause other queries to wait for extended period of time in admission control. - When query result spooling is enabled (
SPOOL_QUERY_RESULTS = TRUE
), the result sets of queries are eagerly fetched and spooled, either in memory or on disk.Once all result rows have been fetched and stored in the spooling location, the resources are freed up. Incoming client fetches can get the data from the spooled results.
Type:
INT
Default:
FALSE
Added in: Impala 3.4
- When query result spooling is disabled (
- SPOOL_ALL_RESULTS_FOR _RETRIES
Leave this query option
SPOOL_ALL_RESULTS_FOR _RETRIES
at its default value “true” for queries that can be retried. This new query option only applies if the query optionsRETRY_FAILED_QUERIES
andSPOOL_QUERY_RESULTS
are enabled.Currently, to avoid incorrect results, query retry will be skipped if any results are returned to the client in the original query. This new query option is introduced, for queries that can be retried, to control spooling all query results before returning any to the client. If the query results cannot be contained in the allocated result-spooling space, results will be returned and query retry will be disabled on the query.
If you set
SPOOL_ALL_RESULTS_FOR _RETRIES
to false, it will fall back to the original behavior - the client can fetch results when they are ready. For queries that cannot be retried, this option does not impact.Type:
BOOLEAN
Default:
TRUE
- SUPPORT_START_OVER query option
-
Leave this setting at its default value. It is a read-only setting, tested by some client applications such as Hue.
If you accidentally change it through impala-shell, subsequent queries encounter errors until you undo the change by issuing
UNSET support_start_over
.Type:
BOOLEAN
Default:
FALSE
- SYNC_DDL query option
-
When enabled, causes any DDL operation such as
CREATE TABLE
orALTER TABLE
to return only when the changes have been propagated to all other Impala nodes in the cluster by the Impala catalog service. That way, if you issue a subsequentCONNECT
statement in impala-shell to connect to a different node in the cluster, you can be sure that other node will already recognize any added or changed tables. (The catalog service automatically broadcasts the DDL changes to all nodes automatically, but without this option there could be a period of inconsistency if you quickly switched to another node, such as by issuing a subsequent query through a load-balancing proxy.)Although
INSERT
is classified as a DML statement, when theSYNC_DDL
option is enabled,INSERT
statements also delay their completion until all the underlying data and metadata changes are propagated to all Impala nodes. Internally, Impala inserts have similarities with DDL statements in traditional database systems, because they create metadata needed to track HDFS block locations for new files and they potentially add new partitions to partitioned tables.Type:
BOOLEAN
Default:
FALSE
- THREAD_RESERVATION_AGGREGATE_LIMIT query option
-
The
THREAD_RESERVATION_AGGREGATE_LIMIT
query option limits the number of reserved threads for a query across all nodes on which it is executing. The option is intended to prevent execution of complex queries that can consume excessive CPU or operating system resources on a cluster. Queries that have more threads than this threshold are rejected by Impala’s admission controller before they start executing.For example, an Impala administrator could set a default value of
THREAD_RESERVATION_AGGREGATE_LIMIT=2000
for a resource pool on a 100 node where they expect only relatively simple queries with less than 20 threads per node to run. This will reject queries that require more than 2000 reserved threads across all nodes, for example a query with 21 fragments running on all 100 nodes of the cluster.You can override the default value per-query or per-session, in the same way as other query options, if you do not want the default
THREAD_RESERVATION_AGGREGATE_LIMIT
value to apply to a specific query or session.Type: Number
Default:
0
(no limit)Added in: Impala 3.1
- THREAD_RESERVATION_LIMIT query option
-
The
THREAD_RESERVATION_LIMIT
query option limits the number of reserved threads for a query on each node. The option is intended to prevent execution of complex queries that can consume excessive CPU or operating system resources on a single node. Queries that have more threads per node than this threshold are rejected by Impala’s admission controller before they start executing. You can see the number of reserved threads for a query in its explain plan in the “Per-Host Resource Reservation" line.For example, an Impala administrator could set a default value of
THREAD_RESERVATION_LIMIT=100
for a resource pool where they expect only relatively simple queries to run. This will reject queries that require more than 100 reserved threads on a node, for example, queries with more than 100 fragments.You can override the default value per-query or per-session, in the same way as other query options, if you do not want the default
THREAD_RESERVATION_LIMIT
value to apply to a specific query or session.Type: Number
Default:
3000
Added in: Impala 3.1
- TIMEZONE query option
-
The
TIMEZONE
query option defines the timezone used for conversions between UTC and the local time. If not set, Impala uses the system time zone where the Coordinator Impalad runs. As query options are not sent to the Coordinator immediately, the timezones are validated only when the query runs.Impala takes the timezone into a consideration in the following cases:- When calling the
NOW()
function - When converting between Unix time and timestamp if the
use_local_tz_for_unix_timestamp_conversions
flag isTRUE
- When reading Parquet timestamps written by Hive if the
convert_legacy_hive_parquet_utc_timestamps
flag isTRUE
Syntax:
SET TIMEZONE=time zone
time zone can be a canonical code or a time zone name defined in IANA Time Zone Database. The value is case-sensitive.
Leading/trailing quotes (') and double quotes (") are stripped.
If time zone is an empty string, the time zone for the query is set to the default time zone of the Impalad Coordinator.
If time zone is
NULL
or a space character, Impala returns an error when the query is executed.Type:
STRING
Default: The system time zone where the Coordinator
impalad
runsExamples:SET TIMEZONE=UTC; SET TIMEZONE="Europe/Budapest";
Added in: Impala 3.1
- When calling the
- TOPN_BYTES_LIMIT query option
-
The
TOPN_BYTES_LIMIT
query option places a limit on the amount of estimated memory that Impala can process for top-N queries.Top-N queries are the queries that include both
ORDER BY
andLIMIT
clauses. Top-N queries don't spill to disk so have to keep all rows they process in memory, and those queries can cause out-of-memory issues when running with a large limit and an offset. If the Impala planner estimates that a top-N operator will process more bytes than theTOPN_BYTES_LIMIT
value, it will replace the top-N operator with the sort operator. Switching to the sort operator allows Impala to spill to disk, thus requiring less memory than top-N, but potentially with performance penalties.The option has no effect when set to
0
or-1
.Syntax:
SET TOPN_BYTES_LIMIT=limit
Type: Number
Default:
536870912
(512 MB)Added in: Impala 3.1