1.2. Configure Tez Container Reuse

Tez settings can be accessed from Ambari >Tez > Configs > Advanced or in tez-site.xml. Enabling Tez container reuse improves performance by avoiding the memory overhead of reallocating container resources for every task. This can be achieved by having the queue retain resources for a specified amount of time, so that subsequent queries run faster.

For good performance with smaller interactive queries on a busy cluster, you might retain resources for 5 minutes. On a less busy cluster, or if consistent timing is very important, you might hold on to resources for 30 minutes.

The following settings can be used to configure Tez to enable container reuse.

  • Tez Application Master Waiting Period (in seconds) -- Specifies the amount of time in seconds that the Tez Application Master (AM) waits for a DAG (directed acyclic graph) to be submitted before shutting down. For example, to set the waiting period to 15 minutes (15 minutes x 60 seconds per minute = 900 seconds):

    tez.session.am.dag.submit.timeout.secs=900
  • Enable Tez Container Reuse -- This configuration parameter determines whether Tez reuses the same container to run multiple queries. Enabling this parameter improves performance by avoiding the memory overhead of reallocating container resources for every query.

    tez.am.container.reuse.enabled=true
  • Tez Container Holding Period -- Specifies the amount of time in milliseconds that a Tez session will retain its containers. For example, to set the holding period to 15 minutes (15 minutes x 60 seconds per minute x 1000 milliseconds per second = 900000 milliseconds):

    tez.am.container.session.delay-allocation-millis=900000

    A holding period of a few seconds is preferable when multiple sessions are sharing a queue. However, a short holding period negatively impacts query latency.

For more information on these and other Tez configuration settings, see the "Configure Tez" subsection in the Installing HDP Manually guide.

[Note]Note

Do not use the tez.queue.name configuration parameter because it sets all Tez jobs to run on one particular queue.

Confirming Container Reuse

To confirm container reuse, run a query, then reload the UI. You should see some number of containers being used. The second or third time you run the query, no allocation of containers should be needed, and the query should run more quickly.