Hive Performance Tuning
Also available as:
PDF

Setting Up Queues for Mixed Interactive and Batch Workloads

For queues that contain both interactive and batch workloads, you can set queues that are based on usage or queues that are based on time.

Setting Up Usage-Based Queue Capacity Change

In general, adjustments for interactive queries do not adversely affect batch queries, so both types of queries can run well together on the same cluster. You can use Capacity Scheduler queues to divide cluster resources between batch and interactive queries. For example, you might set up a configuration that allocates 50% of the cluster capacity to a default queue for batch jobs, and two queues for interactive Hive queries, with each assigned 25% of cluster resources as shown below:

yarn.scheduler.capacity.root.queues=default,hive1,hive2
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.hive1.capacity=25
yarn.scheduler.capacity.root.hive2.capacity=25

The following settings enable the capacity of the batch queue to expand to 100% when the cluster is not being used (for example, at night). The maximum-capacity of the default batch queue is set to 100%, and the user-limit-factor is increased to 2 to enable the queue users to occupy half the configured capacity of the queue (50%):

yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.user-limit-factor=2

Setting Up Time-Based Queue Capacity Change

It is common to allocate capacity to an interactive queue during the day when business users are active and to allocate capacity to a batch queue during the night when batch workloads are frequently executed. To configure this scenario, schedule-based policies are used. This is an alpha Apache feature.