Chapter 7. Running Multiple MapReduce Versions Using the YARN Distributed Cache

Beginning in HDP 2.2, multiple versions of the MapReduce framework can be deployed using the YARN Distributed Cache. By setting the appropriate configuration properties, you can run jobs using a different version of the MapReduce framework than the one currently installed on the cluster.

Distributed cache ensures that the MapReduce job framework version is consistent throughout the entire job lifecycle. This enables you to maintain consistent results from MapReduce jobs during a rolling upgrade of the cluster. Without using Distributed Cache, a MapReduce job might start with one framework version, but finish with the new (upgrade) version, which could lead to unpredictable results.

YARN Distributed Cache enables you to efficiently distribute large read-only files (text files, archives, .jar files, etc) for use by YARN applications. Applications use URLs (hdfs://) to specify the files to be cached, and the Distributed Cache framework copies the necessary files to the applicable nodes before any tasks for the job are executed. Its efficiency stems from the fact that the files are copied only once per job, and archives are extracted after they are copied to the applicable nodes. Note that Distributed Cache assumes that the files to be cached (and specified via hdfs:// URLs) are already present on the HDFS file system and are accessible by every node in the cluster.

Configuring MapReduce for the YARN Distributed Cache

Copy the tarball that contains the version of MapReduce you would like to use into an HDFS directory that applications can access.
```
$HADOOP_HOME/bin/hdfs dfs -put mapreduce.tar.gz /mapred/framework/
```
In the mapred-site.xml file, set the value of the mapreduce.application.framework.path property URL to point to the archive file you just uploaded. The URL allows you to create an alias for the archive if a URL fragment identifier is specified. In the following example, mr-framework is specified as the alias:
```
<property>
 <name>mapreduce.application.framework.path</name>
 <value>hdfs:/mapred/framework/mapreduce.tar.gz#mr-framework</value>
</property>
```

In the mapred-site.xml file, the default value of the mapreduce.application.classpath uses the ${hdp.version} environment variable to reference the currently installed version of HDP:

<property>
 <name>mapreduce.application.classpath</name>
 <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar</value>
</property>

Change the value of the mapreduce.application.classpath property to reference the applicable version of the MapReduce framework .jar files. In this case we need to replace ${hdp.version} with the applicable HDP version, which in our example is 2.2.0.0-2041. Note that in the following example the mr-framework alias is used in the path references.

<property>
 <name>mapreduce.application.classpath</name> 
<value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/2.2.0.0-2041/hadoop/lib/hadoop-lzo-0.6.0.2.2.0.0-2041.jar</value>
</property>

With this configuration in place, MapReduce jobs will run on the version 2.2.0.0-2041 framework referenced in the mapred-site.xml file.

You can upload multiple versions of the MapReduce framework to HDFS and create a separate mapred-site.xml file to reference each version of the framework. Users can then run jobs against a specific version by referencing the applicable mapred-site.xml file. The following example would run a MapReduce job on version 2.1 of the MapReduce framework:

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi -conf etc/hdp-2.1.0.0/mapred-site.xml 10 10

You can use the ApplicationMaster log file to confirm that the job ran on the localized version of MapReduce on the Distributed Cache. For example:

2014-06-10 08:19:30,199 INFO [main] org.mortbay.log: Extract jar: file:/<nm-local-dirs>/filecache/10/hadoop-2.3.0.tar.gz/hadoop-2.3.0/share/hadoop/yarn/hadoop-yarn-common-2.3.0.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_42544_mapreduce____.pryk9q/webapp

Limitations

Support for deploying the MapReduce framework via the YARN Distributed Cache currently does not address the job client code used to submit and query jobs. It also does not address the ShuffleHandler code that runs as an auxiliary service within each NodeManager. Therefore, the following limitations apply to MapReduce versions that can be successfully deployed via the Distributed Cache:

The MapReduce version must be compatible with the job client code used to submit and query jobs. If it is incompatible, the job client must be upgraded separately on any node on which jobs are submitted using the new MapReduce version.
The MapReduce version must be compatible with the configuration files used by the job client submitting the jobs. If it is incompatible with that configuration (that is, a new property must be set, or an existing property value must be changed), the configuration must be updated before submitting jobs.
The MapReduce version must be compatible with the ShuffleHandler version running on the cluster nodes. If it is incompatible, the new ShuffleHandler code must be deployed to all nodes in the cluster, and the NodeManagers must be restarted to pick up the new ShuffleHandler code.

Troubleshooting Tips

You can use the ApplicationMaster log file to check the version of MapReduce being used by a running job. For example:

2014-11-20 08:19:30,199 INFO [main] org.mortbay.log: Extract jar: file:/<nm-local-dirs>/filecache/{...}/hadoop-2.6.0.tar.gz/hadoop-2.6.0/share/hadoop/yarn/hadoop-yarn-common-2.6.0.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_42544_mapreduce____.pryk9q/webapp

If shuffle encryption is enabled, MapReduce jobs may fail with the following exception:

2014-10-10 02:17:16,600 WARN [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to junping-du-centos6.x-3.cs1cloud.internal:13562 with 1 map outputs
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
 at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
 at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
 at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
 at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
 at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
 at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
 at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
 at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
 at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
 at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
 at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
 at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
 at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
 at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
 at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
 at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
 at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
 at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
 at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427)
....

To fix this problem, create a sub-directory under $HADOOP_CONF ($HADOOP_HOME/etc/hadoop by default), and copy the ssl-client.xml file to that directory. Add this new directory path (/etc/hadoop/conf/secure) to the MapReduce classpath specified in mapreduce.application.classpath in the mapred-site.xml file.

Legal notices