Chapter 3. Running Pig with the Tez Execution Engine

By default, Apache Pig runs against Apache MapReduce, but administrators and scripters can configure Pig to run against the Apache Tez execution engine to take advantage of more efficient execution and fewer reads of HDFS. Pig supports Tez in all of the following ways:

Command Line

Use the -x command-line option: pig -x tez

Pig Properties

Set the following configuration property in the conf/pig.properties file: exectype=tez

Java Option

Set the following Java Option for Pig: PIG_OPTS="-D exectype=tez"

Pig Script

Use the set command: set exectype=tez;

Users and administrators can use the same methods to configure Pig to run against the default MapReduce execution engine.

Command Line

Use the -x command-line option: pig -x mr

Pig Properties

Set the following configuration property in the conf/pig.properties file: exectype=tez

Java Option

Set the following Java Option for Pig: PIG_OPTS="-D exectype=tez"

Pig Script

Use the set command: set exectype=mr;

There are some limitations to running Pig with the Tez execution engine:

  • Queries that include the ORDER BY clause may run slower than if run against the MapReduce execution engine.

  • There is currently no user interface that allows users to view the execution plan for Pig jobs running with Tez. To diagnose a failing Pig job, users must read the Application Master and container logs.

[Note]Note

Users should configure parallelism before running Pig with Tez. If parallelism is too low, Pig jobs will run slowly. To tune parallelism, add the PARALLEL clause to your PIG statements.