MEM_LIMIT Query Option
The MEM_LIMIT query option defines the maximum amount of memory a query can allocate on each node. The total memory that can be used by a query is the MEM_LIMIT times the number of nodes.
There are two levels of memory limit for Impala. The ‑‑mem_limit startup option sets an overall limit for the impalad process (which handles multiple queries concurrently). That process memory limit can be expressed either as a percentage of RAM available to the process such as -mem_limit=70% or as a fixed amount of memory, such as ‑‑mem_limit=100gb. The memory available to the process is based on the host's physical memory and, since CDH 6.1 / Impala 3.2, memory limits from Linux Control Groups. For example, if an impalad process is running in a Docker container on a host with 100 GB of memory, the memory available is 100 GB or the Docker container's memory limit, whichever is less.
The MEM_LIMIT query option, which you set through impala-shell or the SET statement in a JDBC or ODBC application, applies to each individual query. The MEM_LIMIT query option is usually expressed as a fixed size such as 10gb, and must always be less than the impalad memory limit.
If query processing exceeds the specified memory limit on any node, either the per-query limit or the impalad limit, Impala cancels the query automatically. Memory limits are checked periodically during query processing, so the actual memory in use might briefly exceed the limit without the query being cancelled.
Type: numeric
Units: A numeric argument represents memory size in bytes; you can also use a suffix of m or mb for megabytes, or more commonly g or gb for gigabytes. If you specify a value with unrecognized formats, subsequent queries fail with an error.
Default: 0 (unlimited)
Usage notes:
The MEM_LIMIT setting is primarily useful in a high-concurrency setting, or on a cluster with a workload shared between Impala and other data processing components. You can prevent any query from accidentally using much more memory than expected, which could negatively impact other Impala queries.
Use the output of the SUMMARY command in impala-shell to get a report of memory used for each phase of your most heavyweight queries on each node, and then set a MEM_LIMIT somewhat higher than that. See Using the SUMMARY Report for Performance Tuning for usage information about the SUMMARY command.
Examples:
The following examples show how to set the MEM_LIMIT query option using a fixed number of bytes, or suffixes representing gigabytes or megabytes.
[localhost:21000] > set mem_limit=3000000000; MEM_LIMIT set to 3000000000 [localhost:21000] > select 5; Query: select 5 +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3g; MEM_LIMIT set to 3g [localhost:21000] > select 5; Query: select 5 +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3gb; MEM_LIMIT set to 3gb [localhost:21000] > select 5; +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3m; MEM_LIMIT set to 3m [localhost:21000] > select 5; +---+ | 5 | +---+ | 5 | +---+ [localhost:21000] > set mem_limit=3mb; MEM_LIMIT set to 3mb [localhost:21000] > select 5; +---+ | 5 | +---+
The following examples show how unrecognized MEM_LIMIT values lead to errors for subsequent queries.
[localhost:21000] > set mem_limit=3tb; MEM_LIMIT set to 3tb [localhost:21000] > select 5; ERROR: Failed to parse query memory limit from '3tb'. [localhost:21000] > set mem_limit=xyz; MEM_LIMIT set to xyz [localhost:21000] > select 5; Query: select 5 ERROR: Failed to parse query memory limit from 'xyz'.
The following examples shows the automatic query cancellation when the MEM_LIMIT value is exceeded on any host involved in the Impala query. First it runs a successful query and checks the largest amount of memory used on any node for any stage of the query. Then it sets an artificially low MEM_LIMIT setting so that the same query cannot run.
[localhost:21000] > select count(*) from customer; Query: select count(*) from customer +----------+ | count(*) | +----------+ | 150000 | +----------+ [localhost:21000] > select count(distinct c_name) from customer; Query: select count(distinct c_name) from customer +------------------------+ | count(distinct c_name) | +------------------------+ | 150000 | +------------------------+ [localhost:21000] > summary; +--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+ | 06:AGGREGATE | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE | | 05:EXCHANGE | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED | | 02:AGGREGATE | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | | | 04:AGGREGATE | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | | | 03:EXCHANGE | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) | | 01:AGGREGATE | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | | | 00:SCAN HDFS | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer | +--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+ [localhost:21000] > set mem_limit=15mb; MEM_LIMIT set to 15mb [localhost:21000] > select count(distinct c_name) from customer; Query: select count(distinct c_name) from customer ERROR: Memory limit exceeded Query did not have enough memory to get the minimum required buffers in the block manager.