Apache Hive Performance Tuning
Also available as:
PDF

Setting Up Interactive Queues for HDP 2.2

In HDP 2.2 and earlier, interactive queues can be set up at the command-line.

In HDP 2.3 and later, you can use Ambari (a GUI) to set up interactive queues.

Configure Tez Container Reuse

Tez settings can be accessed from Ambari > Tez > Configs > Advanced, or in tez-site.xml. Enabling Tez to re-use containers improves performance by avoiding the memory overhead of reallocating container resources for every task. This can be achieved by configuring queues to retain resources for a specified amount of time. Then subsequent queries run faster. However, these settings apply globally to all jobs running in the cluster. To ensure that the settings apply to only one application, you must use separate tez-site.xml files on separate HiveServer2 nodes.

For better performance with smaller interactive queues on a busy cluster, retain resources for 5 minutes. On a less busy cluster, or if consistent timing is important, you can retain resources for up to 30 minutes.

Use the following settings in tez-site.xml to configure container reuse in Tez:

  • Tez Application Master Waiting Period (in seconds)--Specifies the amount of time that the Tez Application Master (AM) waits for a directed acyclic graph (DAG) to be submitted before shutting down. For example, to set the waiting period to 15 minutes (15 minutes X 60 seconds per minute=900 seconds) set the following property to 900:

    tez.session.am.dag.submit.timeout.secs=900
  • Tez min.held-containers--Specifies the minimum number of containers that the AM starts with and retains after a query run is complete. If an AM retains a lot of containers, it gives them up over time until it reaches the number set for min.held-containers. Set the minimum number of containers to be retained with the following property:

    tez.am.session.min.held-containers=<number_of_minimum_containers_to_retain>

    For example, if you have an application that generates queries that require five to ten containers, set the min.held-containers value to 5.

For more information on these settings and other Tez configuration settings, see the "Configure Tez" section in the Non-Ambari Cluster Installation Guide

Configure HiveServer2 Settings

HiveServer2 is used for remote concurrent access to Hive. HiveServer2 settings can be accessed from Ambari > Tez > Configs > Advanced or in hive-site.xml. You must restart HiveServer2 for the updated settings to take effect.

Configure the following settings in hive-site.xml:

  • Hive Execution Engine--Set this to "tez" to execute Hive queries using Tez:

    hive.execution.engine=tez
  • Enable Default Sessions--When enabled, a default session is used for jobs that use HiveServer2 even if they do not use Tez. To enable default sessions, set to "true":

    hive.server2.tez.initialize.default.sessions=true
  • Specify the HiveServer2 Queues--To set multiple queues, use a comma-separated list of queue names. For example, the following specifies the queues "hive1" and "hive2":

    hive.server2.tez.default.queues=hive1,hive2
  • Set the Number of Sessions in Each Queue--Sets the number of sessions for each queue listed in hive.server2.tez.default.queues:

    hive.server2.tez.sessions.per.default.queue=1
  • Set enable.doAs to "False"--When set to "false," the Hive user identity is used instead of the individual user identities for YARN. This setting enhances security and reuse:

    hive.server2.enable.doAs=false
[Note]Note

When doAs is set to false, queries execute as the Hive user and not the end user. When multiple queries run as the Hive user, they can share resources. Otherwise, YARN does not allow resources to be shared across different users. When the Hive user executes all of the queries, a Tez session opened for one query and is holding onto resources can use those resources for the next query without re-allocation.

For more information about these and other HiveServer2 configuration settings on Tez, see the "Configure Hive and HiveServer2 for Tez" section in the Non-Ambari Cluster Installation Guide

Adjusting Settings for Increase Numbers of Concurrent Users

As the number of concurrent users increases, keep the number of queues to a minimum and increase the number of sessions in each queue. For example, for 5-10 concurrent users, 2-5 queues with 1-2 sessions each might be adequate. To set 3 queues with 2 sessions for each queue:

hive.server2.tez.default.queues=hive1,hive2,hive3
hive.server2.tez.sessions.per.default.queue=2

If the number of concurrent users increases to 15, you might achieve better performance by using 5 queues with 3 sessions per queue:

hive.server2.tez.default.queues=hive1,hive2,hive3,hive4,hive5
hive.server2.tez.sessions.per.default.queue=3

The following table provides general guidelines for the number of queues and sessions for various numbers of concurrent users.

Table 3.2. Queues and Sessions for Increasing Numbers of Concurrent Users

Number of Users

Number of Concurrent Users

Number of Queues

Number of Sessions per Queue

50

5

2-5

1-2

100

10

5

2

150

15

5

3

200

20

5

4