3.1.8. Running Existing MapReduce Version 1 Code on YARN

Running the Examples

Most of the MRv1 examples continue to work on YARN, but they are now in a new version of the .jar file. One exception worth mentioning is that the sleep example that used to be in hadoop-examples-1.x.x.jar is not in hadoop-mapreduce-examples-2.x.x.jar, but was moved into the test file hadoop-mapreduce-client-jobclient-2.x.x-tests.jar.

That exception aside, users may want to try running hadoop-examples-1.x.x.jar directly on YARN. Running hadoop-jar hadoop-examples-1.x.x.jar will still pick the classes in hadoop-mapreduce-examples-2.x.x.jar. This behavior is due to Java first searching for the desired class in the system .jar files. If the class is not found, it will go on to search in the user .jar files in the classpath.hadoop-mapreduce-examples-2.x.x.jar, which is installed with other MRv2 .jar files in the Hadoop classpath. Thus the desired class (e.g., WordCount) will instead be picked from this 2.x.x .jar file. However, it is possible to let Java pick the classes from the .jar file which is specified after the -jar option. There are two options:

  • Add HADOOP\_USER\_CLASSPATH\_FIRST=true and HADOOP\_CLASSPATH=...:hadoop-examples-1.x.x.jar as environment variables, and add mapreduce.job.user.classpath.first = true in mapred-site.xml.

  • Remove the 2.x.x .jar file from the classpath. If it is a multiple-node cluster, the .jar file needs to be removed from the classpath on all the nodes.

Running Apache Pig Scripts on YARN

Apache Pig is one of the two major data processing applications in the Hadoop ecosystem, the other being Hive. Due to significant efforts from the Pig community, existing Pig scripts do not require any modifications. Pig on YARN in Hadoop 0.23 has been supported since Pig version 0.10.0, and Pig working with Hadoop 2.x has been supported starting with Pig version 0.10.1.

Existing Pig scripts that work with Pig version 0.10.1 and beyond will work just fine on YARN, however, versions earlier than Pig 0.10.x may not run directly on YARN due to some incompatible MapReduce APIs and configuration.

Running Apache Hive Queries on YARN

Existing Hive queries do not need any changes to work on YARN starting with Hive-0.10.0, thanks to the work done by Hive community. Support for Hive on YARN in Hadoop 0.23 and 2.x releases has been in place since Hive-0.10.0. Queries that work on Hive-0.10.0 and beyond will work without changes on YARN. However, as with Pig, earlier versions of Hive may not run directly on YARN, as those Hive releases do not support 0.23 and 2.x.

Running Apache Oozie Workflows on YARN

As with Pig and Hive, the Apache Oozie community worked to make sure existing Oozie workflows run in a completely backwardly-compatible manner. Support for Hadoop 0.23 and 2.x is available starting Oozie release 3.2.0. Existing Oozie workflows can start taking advantage of YARN in 0.23 and 2.x with Oozie 3.2.0 and above.