Performance Tuning Guide
Also available as:
PDF

Configure Tez Container Reuse

Tez settings can be accessed from Ambari >Tez > Configs > Advanced or in tez-site.xml. Enabling Tez container reuse improves performance by avoiding the memory overhead of reallocating container resources for every task. This can be achieved by having the queue retain resources for a specified amount of time, so that subsequent queries run faster.

For good performance with smaller interactive queries on a busy cluster, you might retain resources for 5 minutes. On a less busy cluster, or if consistent timing is very important, you might hold on to resources for 30 minutes.

The following settings can be used to configure Tez to enable container reuse.

  • Tez Application Master Waiting Period (in seconds) -- Specifies the amount of time in seconds that the Tez Application Master (AM) waits for a DAG (directed acyclic graph) to be submitted before shutting down. For example, to set the waiting period to 15 minutes (15 minutes x 60 seconds per minute = 900 seconds):

    tez.session.am.dag.submit.timeout.secs=900
  • Tez Min-Held Containers -- Specifies the minimum number of containers that the Tez Application Master (AM) holds onto after running the first query. If an AM holds on to many containers, it releases them incrementally until it reaches the specified number. For example, if you have an application that generates queries which take 5 to 10 containers, it is recommended that you use this parameter to retain 5 containers:

    tez.am.session.min.held-containers=<number_of_minimum_containers_to_hold>

For more information on these and other Tez configuration settings, see the "Configure Tez" subsection in the "Installing and Configuring Apache Tez" section of the Installing HDP Manually guide.

[Note]Note

Do not use the tez.queue.name configuration parameter because it sets all Tez jobs to run on one particular queue.

Confirming Container Reuse

To confirm container reuse, run a query, then reload the UI. You should see some number of containers being used. The second or third time you run the query, no allocation of containers should be needed, and the query should run more quickly.