1. Tuning for Interactive Hive Queries

The following general guidelines are recommended for interactive Hive queries. The YARN, Tez, and HiveServer2 configuration settings used to implement these guidelines are discussed in more detail in subsequent sections of this document.

  • Limit the number of queues -- Because Capacity Scheduler queues allocate more-or-less fixed percentages of cluster capacity, Hortonworks recommends configuring clusters with a few small-capacity queues for interactive queries. Queue capacity can be configured to be flexible, and a good use case for that can be to allow batch processes to take over the entire cluster at night when other workloads are not running.

  • Allocate queues based on query duration -- For example, say you have two types of commonly used queries. One type of query takes about 5 seconds to run, and the other type takes about 45 seconds to run. If both of these types were assigned to the same queue, the shorter queries might have to wait for the longer queries. In this case it would make sense to assign the shorter queries to one queue, and the longer queries to another queue.

  • Reuse containers to increase performance -- Enabling Tez container reuse improves performance by avoiding the memory overhead of reallocating container resources for every task.

  • Configure Query Vectorization -- Query vectorization helps reduce execution time for interactive queries.

  • Use sessions to allocate resources within individual queues -- rather than increasing the number of queues.