Configuring Tez

Perform the following steps to configure Tez for your Hadoop cluster:

Create a tez-site.xml configuration file and place it in the /etc/tez/conf configuration directory. A sample tez-site.xml file is included in the configuration_files/tez folder in the HDP companion files.
In the tez-site.xml file, configure the tez.lib.uris property with the HDFS path containing the Tez tarball file.
```
...
<property>
     <name>tez.lib.uris</name>
     <value>/hdp/apps/<hdp_version>/tez/tez.tar.gz</value>
</property>
...
```
Where <hdp_version> is the current HDP version, such as 2.3.6.0.

Table 8.1. Tez Configuration Parameters

Configuration Parameter	Description	Default Value
tez.am.acls.enabled	Enables or disables access control list checks on Application Master (AM) and history data.	true
tez.am.am-rm.heartbeat.interval-ms.max	The maximum heartbeat interval between the AM and RM in milliseconds.	250
tez.am.client.am.port-range	Range of ports that the AM can use when binding for client connections. Leave this blank to use all possible ports.	No default setting. The format is a number range. For example, `10000-19999`
tez.am.container.idle.release-timeout-max.millis	The maximum amount of time to hold on to a container if no task can be assigned to it immediately. Only active when reuse is enabled.	20000
tez.am.container.idle.release-timeout-min.millis	The minimum amount of time to hold on to a container that is idle. Only active when reuse is enabled.	10000
tez.am.container.reuse.enabled	Configuration that specifies whether a container should be reused.	true
tez.am.container.reuse.locality.delay-allocation-millis	The amount of time to wait before assigning a container to the next level of locality. NODE -> RACK -> NON_LOCAL	250
tez.am.container.reuse.non-local-fallback.enabled	Specifies whether to reuse containers for non-local tasks. Active only if reuse is enabled.	false
tez.am.container.reuse.rack-fallback.enabled	Specifies whether to reuse containers for rack local tasks. Active only if reuse is enabled.	true
tez.am.launch.cluster-default.cmd-opts	Note: This property should only be set by administrators -- it should not be used by non-administrative users. Cluster default Java options for the Tez AppMaster process. These will be prepended to the properties specified with tez.am.launch.cmd-opts.	-server -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version}
tez.am.launch.cmd-opts	Command line options that are provided during the launch of the Tez `AppMaster` process. Do not set any Xmx or Xms in these launch options so that Tez can determine them automatically.	-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC
tez.am.launch.env	Environment settings for the Tez `AppMaster` process.	LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/
tez.am.log.level	Root logging level passed to the Tez Application Master. Simple configuration: Set the log level for all loggers. For example, set to `INFO`. This sets the log level to INFO for all loggers. Advanced configuration: Set the log level for all classes, along with a different level for some classes. For example, set to `DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO` This sets the log level for all loggers to DEBUG, except for `org.apache.hadoop. ipc` and `org.apache.hadoop.security`, which are set to INFO. Note:The global log level must always be the first parameter. For example: `DEBUG;org.apache.hadoop. ipc=INFO;org.apache. hadoop.security=INFO` is valid. `org.apache.hadoop.ipc=INFO;org.apache.hadoop. security=INFO` is not valid.	INFO
tez.am.max.app.attempts	Specifies the total number of times that the app master is re-run in case recovery is triggered.	2
tez.am.maxtaskfailures.per.node	The maximum number of allowed task attempt failures on a node before it gets marked as blacklisted.	10
tez.am.modify-acls	Enables specified users or groups to modify operations on the AM such as submitting DAGs, pre-warming the session, killing DAGs, or shutting down the session. Format: comma-separated list of users, followed by a white space, and then a comma-separated list of groups. For example, `"lhale,msmith administrators,users"`	No default setting
tez.am.resource.cpu.vcores	The number of virtual cores to be used by the `AppMaster` process. Set this to > 1 if the RM Scheduler is configured to support virtual cores.	1
tez.am.resource.memory.mb	The amount of memory to be used by the AppMaster. Used only if the value is not specified explicitly by the DAG definition.	1536
tez.am.session.min.held-containers	The minimum number of containers that will be held in session mode. Not active in non-session mode. Enables an idle session that is not running a DAG to hold on to a minimum number of containers to provide fast response times for the next DAG.	0
tez.am.task.max.failed.attempts	The maximum number that can fail for a particular task before the task fails. This does not count killed attempts. A task failure results in a DAG failure. Must be an integer.	4
tez.am.view-acls	AM view ACLs. This setting enables the specified users or groups to view the status of the AM and all DAGs that run within the AM. Format: a comma-separated list of users, a white space, and then a comma-separated list of groups. For example, `"lhale,msmith administrators,users"`	No default value
tez.cluster.additional.classpath.prefix	Specify additional classpath information to be used for Tez AM and all containers. This will be prepended to the classpath before all framework specific components have been specified.	/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure
tez.container.max.java.heap.fraction	A double value. Tez automatically determines the Xmx for the Java virtual machines that are used to run Tez tasks and Application Masters. This is enabled if the Xmx or Xms values have not been specified in the launch command options. Automatic Xmx calculation is preferred because Tez can determine the best value based on the actual allocation of memory to tasks in the cluster. The value should be greater than 0 and less than 1.	0.8
tez.counters.max	The number of allowed counters for the executing DAG.	2000
tez.counters.max.groups	The number of allowed counter groups for the executing DAG.	1000
tez.generate.debug.artifacts	Generates debug artifacts such as a text representation of the submitted DAG plan.	false
tez.grouping.max-size	Upper size limit (in bytes) of a grouped split, to avoid generating an excessively large split. Replaces tez.am.grouping.max-size	1073741824 (1 GB)
tez.grouping.min-size	Lower size limit (in bytes) of a grouped split, to avoid generating too many splits.	52428800 (50 MB)
tez.grouping.split-waves	The multiplier for available queue capacity when determining number of tasks for a Vertex. When set to its default value of 1.7 with 100% queue available implies generating a number of tasks roughly equal to 170% of the available containers on the queue.	1.7
tez.history.logging.service.class	The class to be used for logging history data. Set to org.apache.tez.dag.history.logging. ats.ATSHistoryLoggingService to log to ATS. Set to org.apache.tez. dag.history.logging.impl. SimpleHistoryLoggingService to log to the filesystem specified by ${fs.defaultFS}.	org.apache.tez.dag.history.logging. ats.ATSHistoryLoggingService
tez.lib.uris	Comma-delimited list of the location of the Tez libraries which will be localized for DAGs. Specifying a single .tar.gz or .tgz assumes that a compressed version of the tez libs is being used. This is uncompressed into a tezlibs directory when running containers, and tezlibs/;tezlibs/lib/ are added to the classpath (after . and .*). If multiple files are specified - files are localized as regular files, contents of directories are localized as regular files (non-recursive).	/hdp/apps/<hdp_version>/tez/tez.tar.gz
tez.queue.name	This property should not be set in `tez-site.xml`. Instead, it can be provided on the command line when you are launching a job to determine which YARN queue to submit a job to.	No default setting
tez.runtime.compress	Specifies whether intermediate data should be compressed or not.	true
tez.runtime.compress.codec	The codec to be used if compressing intermediate data. Only applicable if tez.runtime.compress is enabled.	org.apache.hadoop.io.compress. SnappyCodec
tez.runtime.io.sort.factor	The number of streams to merge at once while sorting files. This determines the number of open file handles.	10
tez.runtime.io.sort.mb	The size of the sort buffer when output is sorted.	512
tez.runtime.sorter.class	Which sorter implementation to use. Valid values: `LEGACY` `PIPELINED` The legacy sorter implementation is based on the Hadoop MapReduce shuffle implementation. It is restricted to 2GB memory limits. Pipeline sorter is a more efficient sorter that supports > 2GB sort buffers.	PIPELINED
tez.runtime.sort.spill.percent	The soft limit in the serialization buffer. Once this limit is reached, a thread begins to spill the contents to disk in the background. Note:Collection will not block if this threshold is exceeded while a spill is already in progress, so spills can be larger than this threshold when it is set to less than .5	0.8
tez.runtime.unordered.output. buffer.size-mb	The size of the buffer when output is not sorted.	100
tez.session.am.dag.submit.timeout.secs	Time (in seconds) for which the Tez AM should wait for a DAG to be submitted before shutting down.	300
tez.session.client.timeout.secs	Time (in seconds) to wait for AM to come up when trying to submit a DAG from the client.	-1
tez.shuffle-vertex-manager.max-src-fraction	In case of a ScatterGather connection, once this fraction of source tasks have completed, all tasks on the current vertex can be scheduled. Number of tasks ready for scheduling on the current vertex scales linearly between min-fraction and max-fraction.	0.4
tez.shuffle-vertex-manager.min-src-fraction	In case of a ScatterGather connection, the fraction of source tasks which should complete before tasks for the current vertex are scheduled.	0.2
tez.staging-dir	The staging directory used while submitting DAGs.	/tmp/${user.name}/staging
tez.task.am.heartbeat.counter.interval-ms.max	Time interval at which task counters are sent to the AM.	4000
tez.task.generate.counters.per.io	Sets whether to generate counters per IO or not. Enabling this will rename CounterGroups/CounterNames, making them unique per vertex edge instead of unique per vertex.	true
tez.task.get-task.sleep.interval-ms.max	Maximum amount of time, in seconds, to wait before a task asks an AM for another task.	200
tez.task.launch.cluster-default.cmd-opts	Note: This property should only be set by administrators -- it should not be used by non-administrative users. Cluster default Java options for tasks. These will be prepended to the properties specified with tez.task.launch.cmd-opts	-server -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version}
tez.task.launch.cmd-opts	Java options for tasks. The Xmx value is derived based on tez.task.resource.memory.mb and is 80% of this value by default. Used only if the value is not specified explicitly by the DAG definition.	-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC
tez.task.launch.env	Additional execution environment entries for Tez. This is not an additive property. You must preserve the original value if you want to have access to native libraries. Used only if the value is not specified explicitly by the DAG definition.	LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native:/usr/hdp/${hdp.version}/hadoop/lib/native/Linux-amd64-64/
tez.task.log.level	Root logging level that is passed to the Tez tasks. Simple configuration: Set the log level for all loggers. For example, set to `INFO`. This sets the log level to INFO for all loggers. Advanced configuration: Set the log level for all classes, along with a different level for some classes. For example, set to `DEBUG;org.apache.hadoop. ipc=INFO;org.apache.hadoop.security=INFO` This sets the log level for all loggers to DEBUG, except for `org.apache.hadoop.ipc` and `org.apache.hadoop.security`, which are set to INFO. Note:The global log level must always be the first parameter. For example: `DEBUG;org.apache.hadoop. ipc=INFO;org.apache.hadoop. security=INFO` is valid. `org.apache.hadoop.ipc= INFO;org.apache.hadoop.security=INFO` is not valid.	INFO
tez.task.max-events-per-heartbeat	Maximum number of events to fetch from the AM by the tasks in a single heartbeat.	500
tez.task.resource.cpu.vcores	The number of virtual cores to be used by the Tez tasks. Set this to > 1 if RM Scheduler is configured to support virtual cores.	1
tez.task.resource.memory.mb	The amount of memory to be used by launched tasks. Used only if the value is not specified explicitly by the DAG definition.	1024
tez.use.cluster.hadoop-libs	Specifies whether Tez will use the cluster Hadoop libraries. This property should not be set in `tez-site.xml`, or if it is set, the value should be false.	false

	Note
	There are no additional steps required to secure Tez if your cluster is already configured for security.

To monitor the progress of a Tez job or to analyze the history of a Tez job, set up the Tez View in Ambari. For information about setting up the Tez view, see Configuring Your Cluster for Tez View in the HDP Ambari Views Guide.

​Configuring Tez

Configuring Tez