RUNTIME_BLOOM_FILTER_SIZE Query Option (CDH 5.7 or higher only)

Size (in bytes) of Bloom filter data structure used by the runtime filtering feature.

Type: integer

Default: 1048576 (1 MB)

Maximum: 16 MB

Added in: CDH 5.7.0 / Impala 2.5.0

Usage notes:

This setting affects optimizations for large and complex queries, such as dynamic partition pruning for partitioned tables, and join optimization for queries that join large tables. Larger filters are more effective at handling higher cardinality input sets, but consume more memory per filter.

If your query filters on high-cardinality columns (for example, millions of different values) and you do not get the expected speedup from the runtime filtering mechanism, consider doing some benchmarks with a higher value for RUNTIME_BLOOM_FILTER_SIZE. The extra memory devoted to the Bloom filter data structures can help make the filtering more accurate.

Because the runtime filtering feature applies mainly to resource-intensive and long-running queries, only adjust this query option when tuning long-running queries involving some combination of large partitioned tables and joins involving large tables.

Because the effectiveness of this setting depends so much on query characteristics and data distribution, you typically only use it for specific queries that need some extra tuning, and the ideal value depends on the query. Consider setting this query option immediately before the expensive query and unsetting it immediately afterward.

Kudu considerations:

This query option affects only Bloom filters, not the min/max filters that are applied to Kudu tables. Therefore, it does not affect the performance of queries against Kudu tables.

Related information:

Runtime Filtering for Impala Queries (CDH 5.7 or higher only), RUNTIME_FILTER_MODE Query Option (CDH 5.7 or higher only), RUNTIME_FILTER_MIN_SIZE Query Option (CDH 5.8 or higher only), RUNTIME_FILTER_MAX_SIZE Query Option (CDH 5.8 or higher only)