Chapter 3. Enabling Efficient Execution with Apache Pig and Apache Tez
By default, Apache Pig runs against Apache MapReduce, but administrators and scripters can configure Pig to run against the Apache Tez execution engine to take advantage of more efficient execution and fewer reads of HDFS. Pig supports Tez in all of the following ways:
Command Line | Use the -x command-line option: |
Pig Properties | Set the following configuration property in the conf/pig.properties
file: |
Java Option | Set the following Java Option for Pig: |
Users and administrators can use the same methods to configure Pig to run against the default MapReduce execution engine.
Command Line | Use the |
Pig Properties | Set the following configuration property in the conf/pig.properties
file: |
Java Option | Set the following Java Option for Pig: |
Pig Script | Use the |
There are some limitations to running Pig with the Tez execution engine:
Queries that include the ORDER BY clause may run slower than if run against the MapReduce execution engine.
There is currently no user interface that allows users to view the execution plan for Pig jobs running with Tez. To diagnose a failing Pig job, users must read the Application Master and container logs.
Note | |
---|---|
Users should configure parallelism before running Pig with Tez. If parallelism is too low, Pig jobs will run slowly.
To tune parallelism, add the |
Running a Pig-on-Tez Job with Oozie
To run a Pig job on Tez using Oozie, perform the following configurations:
Add the following property and value to the
job.properties
file for the Pig-on-Tez Oozie job:<property> <name>oozie.action.sharelib.for.pig</name> <value>pig, hive</value> </property>
Create the
$OOZIE_HOME/conf/action-conf/pig
directory and copy thetez-site.xml
file into it.