Configuring Tez
Perform the following steps to configure Tez for your Hadoop cluster:
Create a tez-site.xml configuration file and place it in the
/etc/tez/conf
configuration directory. A sampletez-site.xml
file is included in theconfiguration_files/tez
folder in the HDP companion files.In the
tez-site.xml
file, configure the tez.lib.uris property with the HDFS path containing the Tez tarball file.... <property> <name>tez.lib.uris</name> <value>/hdp/apps/<hdp_version>/tez/tez.tar.gz</value> </property> ...
Where <hdp_version> is the current HDP version, such as 2.4.2.0.
Table 8.1. Tez Configuration Parameters
Configuration Parameter |
Description |
Default Value |
---|---|---|
tez.am.acls.enabled |
Enables or disables access control list checks on Application Master (AM) and history data. |
true |
tez.am.am-rm.heartbeat.interval-ms.max |
The maximum heartbeat interval between the AM and RM in milliseconds. |
250 |
tez.am.client.am.port-range |
Range of ports that the AM can use when binding for client connections. Leave this blank to use all possible ports. |
No default setting. The format is a number range. For example,
|
tez.am.container.idle.release-timeout-max.millis |
The maximum amount of time to hold on to a container if no task can be assigned to it immediately. Only active when reuse is enabled. |
20000 |
tez.am.container.idle.release-timeout-min.millis |
The minimum amount of time to hold on to a container that is idle. Only active when reuse is enabled. |
10000 |
tez.am.container.reuse.enabled |
Configuration that specifies whether a container should be reused. |
true |
tez.am.container.reuse.locality.delay-allocation-millis |
The amount of time to wait before assigning a container to the next level of locality. NODE -> RACK -> NON_LOCAL |
250 |
tez.am.container.reuse.non-local-fallback.enabled |
Specifies whether to reuse containers for non-local tasks. Active only if reuse is enabled. |
false |
tez.am.container.reuse.rack-fallback.enabled |
Specifies whether to reuse containers for rack local tasks. Active only if reuse is enabled. |
true |
tez.am.launch.cluster-default.cmd-opts |
Note: This property should only be set by administrators -- it should not be used by non-administrative users. Cluster default Java options for the Tez AppMaster process. These will be prepended to the properties specified with tez.am.launch.cmd-opts. |
-server -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version} |
tez.am.launch.cmd-opts |
Command line options that are provided during the launch of the
Tez |
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC |
tez.am.launch.env |
Environment settings for the Tez |
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/ |
tez.am.log.level |
Root logging level passed to the Tez Application Master. Simple configuration: Set the log level for all loggers. For
example, set to Advanced configuration: Set the log level for all classes, along with a different level for some classes. For example, set to
This sets the log level for all loggers to DEBUG, except
for Note:The global log level must always be the first parameter. For example:
| INFO |
tez.am.max.app.attempts | Specifies the total number of times that the app master is re-run in case recovery is triggered. | 2 |
tez.am.maxtaskfailures.per.node | The maximum number of allowed task attempt failures on a node before it gets marked as blacklisted. | 10 |
tez.am.modify-acls | Enables specified users or groups to modify operations on the AM such as submitting DAGs, pre-warming the session, killing DAGs, or shutting down the session. Format: comma-separated list of users, followed by a white space, and
then a comma-separated list of groups. For example,
| No default setting |
tez.am.resource.cpu.vcores | The number of virtual cores to be used by the | 1 |
tez.am.resource.memory.mb | The amount of memory to be used by the AppMaster. Used only if the value is not specified explicitly by the DAG definition. | 1536 |
tez.am.session.min.held-containers | The minimum number of containers that will be held in session mode. Not active in non-session mode. Enables an idle session that is not running a DAG to hold on to a minimum number of containers to provide fast response times for the next DAG. | 0 |
tez.am.task.max.failed.attempts | The maximum number that can fail for a particular task before the task fails. This does not count killed attempts. A task failure results in a DAG failure. Must be an integer. | 4 |
tez.am.view-acls | AM view ACLs. This setting enables the specified users or groups to view the status of the AM
and all DAGs that run within the AM. Format: a comma-separated list
of users, a white space, and then a comma-separated list of groups.
For example, | No default value |
tez.cluster.additional.classpath.prefix | Specify additional classpath information to be used for Tez AM and all containers. This will be prepended to the classpath before all framework specific components have been specified. | /usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure |
tez.container.max.java.heap.fraction | A double value. Tez automatically determines the Xmx for the Java virtual machines that are used to run Tez tasks and Application Masters. This is enabled if the Xmx or Xms values have not been specified in the launch command options. Automatic Xmx calculation is preferred because Tez can determine the best value based on the actual allocation of memory to tasks in the cluster. The value should be greater than 0 and less than 1. | 0.8 |
tez.counters.max | The number of allowed counters for the executing DAG. | 2000 |
tez.counters.max.groups | The number of allowed counter groups for the executing DAG. | 1000 |
tez.generate.debug.artifacts | Generates debug artifacts such as a text representation of the submitted DAG plan. | false |
tez.grouping.max-size | Upper size limit (in bytes) of a grouped split, to avoid generating an excessively large split. Replaces tez.am.grouping.max-size | 1073741824 (1 GB) |
tez.grouping.min-size | Lower size limit (in bytes) of a grouped split, to avoid generating too many splits. | 52428800 (50 MB) |
tez.grouping.split-waves | The multiplier for available queue capacity when determining number of tasks for a Vertex. When set to its default value of 1.7 with 100% queue available implies generating a number of tasks roughly equal to 170% of the available containers on the queue. | 1.7 |
tez.history.logging.service.class | The class to be used for logging history data. Set to org.apache.tez.dag.history.logging. ats.ATSHistoryLoggingService to log to ATS. Set to org.apache.tez. dag.history.logging.impl. SimpleHistoryLoggingService to log to the filesystem specified by ${fs.defaultFS}. | org.apache.tez.dag.history.logging. ats.ATSHistoryLoggingService |
tez.lib.uris | Comma-delimited list of the location of the Tez libraries which will be localized for DAGs. Specifying a single .tar.gz or .tgz assumes that a compressed version of the tez libs is being used. This is uncompressed into a tezlibs directory when running containers, and tezlibs/;tezlibs/lib/ are added to the classpath (after . and .*). If multiple files are specified - files are localized as regular files, contents of directories are localized as regular files (non-recursive). |
There is no default value, but it should be set to:
|
tez.queue.name | This property should not be set in | No default setting |
tez.runtime.compress | Specifies whether intermediate data should be compressed or not. | true |
tez.runtime.compress.codec | The codec to be used if compressing intermediate data. Only applicable if tez.runtime.compress is enabled. | org.apache.hadoop.io.compress. SnappyCodec |
tez.runtime.io.sort.factor | The number of streams to merge at once while sorting files. This determines the number of open file handles. | 10 |
tez.runtime.io.sort.mb | The size of the sort buffer when output is sorted. | 512 |
tez.runtime.sorter.class | Which sorter implementation to use. Valid values:
The legacy sorter implementation is based on the Hadoop MapReduce shuffle implementation. It is restricted to 2GB memory limits. Pipeline sorter is a more efficient sorter that supports > 2GB sort buffers. | PIPELINED |
tez.runtime.sort.spill.percent | The soft limit in the serialization buffer. Once this limit is reached, a thread begins to spill the contents to disk in the background. Note:Collection will not block if this threshold is exceeded while a spill is already in progress, so spills can be larger than this threshold when it is set to less than .5 | 0.8 |
tez.runtime.unordered.output. buffer.size-mb | The size of the buffer when output is not sorted. | 100 |
tez.session.am.dag.submit.timeout.secs | Time (in seconds) for which the Tez AM should wait for a DAG to be submitted before shutting down. | 300 |
tez.session.client.timeout.secs | Time (in seconds) to wait for AM to come up when trying to submit a DAG from the client. | -1 |
tez.shuffle-vertex-manager.max-src-fraction | In case of a ScatterGather connection, once this fraction of source tasks have completed, all tasks on the current vertex can be scheduled. Number of tasks ready for scheduling on the current vertex scales linearly between min-fraction and max-fraction. | 0.4 |
tez.shuffle-vertex-manager.min-src-fraction | In case of a ScatterGather connection, the fraction of source tasks which should complete before tasks for the current vertex are scheduled. | 0.2 |
tez.staging-dir | The staging directory used while submitting DAGs. | /tmp/${user.name}/staging |
tez.task.am.heartbeat.counter.interval-ms.max | Time interval at which task counters are sent to the AM. | 4000 |
tez.task.generate.counters.per.io | Sets whether to generate counters per IO or not. Enabling this will rename CounterGroups/CounterNames, making them unique per vertex edge instead of unique per vertex. | true |
tez.task.get-task.sleep.interval-ms.max | Maximum amount of time, in seconds, to wait before a task asks an AM for another task. | 200 |
tez.task.launch.cluster-default.cmd-opts | Note: This property should only be set by administrators -- it should not be used by non-administrative users. Cluster default Java options for tasks. These will be prepended to the properties specified with tez.task.launch.cmd-opts | -server -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version} |
tez.task.launch.cmd-opts | Java options for tasks. The Xmx value is derived based on tez.task.resource.memory.mb and is 80% of this value by default. Used only if the value is not specified explicitly by the DAG definition. | -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC |
tez.task.launch.env | Additional execution environment entries for Tez. This is not an additive property. You must preserve the original value if you want to have access to native libraries. Used only if the value is not specified explicitly by the DAG definition. | LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native:/usr/hdp/${hdp.version}/hadoop/lib/native/Linux-amd64-64/ |
tez.task.log.level | Root logging level that is passed to the Tez tasks. Simple configuration: Set the log level for all loggers. For
example, set to Advanced configuration: Set the log level for all classes, along with a different level for some classes. For example, set to
This sets the log level for all loggers to DEBUG, except
for Note:The global log level must always be the first parameter. For example:
| INFO |
tez.task.max-events-per-heartbeat | Maximum number of events to fetch from the AM by the tasks in a single heartbeat. | 500 |
tez.task.resource.cpu.vcores | The number of virtual cores to be used by the Tez tasks. Set this to > 1 if RM Scheduler is configured to support virtual cores. | 1 |
tez.task.resource.memory.mb | The amount of memory to be used by launched tasks. Used only if the value is not specified explicitly by the DAG definition. | 1024 |
tez.use.cluster.hadoop-libs | Specifies whether Tez will use the cluster Hadoop libraries. This property should not be set in | false |
Note | |
---|---|
There are no additional steps required to secure Tez if your cluster is already configured for security. |
To monitor the progress of a Tez job or to analyze the history of a Tez job, set up the Tez View in Ambari. For information about setting up the Tez view, see Configuring Your Cluster for Tez View in the HDP Ambari Views Guide.