SCHEDULE_RANDOM_REPLICA Query Option (CDH 5.7 or higher only)

The SCHEDULE_RANDOM_REPLICA query option fine-tunes the scheduling algorithm for deciding which host processes each HDFS data block or Kudu tablet to reduce the chance of CPU hotspots.

By default, Impala estimates how much work each host has done for the query, and selects the host that has the lowest workload. This algorithm is intended to reduce CPU hotspots arising when the same host is selected to process multiple data blocks / tablets. Use the SCHEDULE_RANDOM_REPLICA query option if hotspots still arise for some combinations of queries and data layout.

The SCHEDULE_RANDOM_REPLICA query option only applies to tables and partitions that are not enabled for the HDFS caching.

Type: Boolean; recognized values are 1 and 0, or true and false; any other value interpreted as false

Default: false

Added in: CDH 5.7.0 / Impala 2.5.0

Related information:

Using HDFS Caching with Impala (CDH 5.3 or higher only), Avoiding CPU Hotspots for HDFS Cached Data , REPLICA_PREFERENCE Query Option (CDH 5.9 or higher only)

SCAN_NODE_CODEGEN_THRESHOLD

SCRATCH_LIMIT