Configuring intermediate results caching

Learn about the configurations required to enable the intermediate results cache for Impala queries.

To use the intermediate results cache, you must configure the following settings. By default, these features are disabled.

If the intermediate results cache storage is shared with other elements, such as the data cache or scratch space, you might need to adjust existing quotas (for example, the --data_cache startup flag) to provide sufficient space.

  1. In Cloudera Manager, click Clusters > Impala > Configuration
  2. Search for Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)
  3. --allow_tuple_caching=true --tuple_cache=[directory_path]:[capacity]
    Add the above flags.
    Example:
    --allow_tuple_caching=true --tuple_cache=/cache/impala:20GB
  4. If you must reduce the data cache to provide space for the intermediate results cache, update the --data_cache startup flag with the new capacity.
    1. Search for Impala Daemon Default Query Options
    2. Add the flag enable_tuple_cache=true;
  5. Click Save Changes and restart the Impala service.

Impala now stores intermediate query results in the specified local directory. Subsequent queries with matching plan fragments can retrieve data from the cache, which reduces execution time and resource consumption.

You can monitor cache hits and performance by checking the Impala Query Profile. The profile displays metrics for tuple cache hits under the relevant plan nodes.