Apache Hive Performance Tuning
Also available as:

Guidelines for Interactive Queues

The following general guidelines are recommended for interactive Hive queries. The YARN, Tez, and HiveServer2 configuration settings used to implement these guidelines are discussed in more detail in subsequent sections of this guide.

  • Limit the number of queues--Because Capacity Scheduler queues allocate a fixed percentage of cluster capacity, Hortonworks recommends configuring clusters with a few small-capacity queues for interactive queries for HDP versions 2.2.x and earlier. For HDP 2.3 and later, a single interactive queue is recommended.

  • Allocate queues based on query duration--For example, if you have two applications with two different types of commonly used queries. One type of query takes approximately 5 seconds to run, and the other type takes approximately 45 seconds to run. If both of these types of queries were assigned to the same queue, the shorter-running queries must wait for the longer-running queries. In this case, it is recommended that the two queries with different execution times be assigned to separate queues.

  • Re-use containers to increase performance--Enabling Tez container re-use improves performance by avoiding the memory overhead of reallocating container resources for every task.

  • Use sessions to allocate resources within individual queues--This strategy is better than increasing the number of queues.

The following sections of this chapter contain instructions for configuring the above best practices: