Caching intermediate results

Learn about how Impala uses the intermediate results cache to improve query performance and resource efficiency.

Impala processes every query independently, because intermediate results are discarded after a query completes, subsequent queries must recompute these results even if the underlying data remains unchanged.

Caching intermediate results improves latency for repetitive work and frees up resources for other queries.

Unlike query result caching, this method allows for more granular matching of shared work between similar queries.

Caching within the database provides the following benefits:

Data awareness – The database has direct knowledge of modifications to base tables.
Security – The database uses authorization information to safely share cached results among users with equivalent privileges.
Consistency – The query planner detects tables changes and prevents the use of stale results.

No lag or staleness exists for query results.

The cache key incorporates all factors that can impact the query results, including information about the base tables and any query options. When any of those factors change, a new cache entry is generated. For example, if you ingest new data into a base table, the key changes.

Administrators do not need to manually refresh or invalidate cache entries.

When the cache reaches the quota, Impala evicts cache entries to make space for new entries. You can specify the eviction policy by using the --tuple_cache_eviction_policy startup flag.

The cache supports the following eviction policies:

Least Recently Used (LRU) – This is the default policy.
Low Inter-reference Recency Set (LIRS) – This is a scan-resistant policy with low performance overhead.