3.1.7. Running Existing Hadoop Version 1 Applications on YARN

To ease the transition from Hadoop Version 1 to YARN, a major goal of YARN and the MapReduce framework implementation on top of YARN was to ensure that existing MapReduce applications that were programmed and compiled against previous MapReduce APIs (we’ll call these MRv1 applications) can continue to run with little or no modification on YARN (we’ll refer to these as MRv2 applications).

Binary Compatibility of org.apache.hadoop.mapred APIs

For the vast majority of users who use the org.apache.hadoop.mapred APIs, MapReduce on YARN ensures full binary compatibility. These existing applications can run on YARN directly without recompilation. You can use .jar files from your existing application that code against mapred APIs, and use bin/hadoop to submit them directly to YARN.

Source Compatibility of org.apache.hadoop.mapreduce APIs

Unfortunately, it was difficult to ensure full binary compatibility to the existing applications that compiled against MRv1 org.apache.hadoop.mapreduce APIs. These APIs have gone through many changes. For example, several classes stopped being abstract classes and changed to interfaces. Therefore, the YARN community compromised by only supporting source compatibility for org.apache.hadoop.mapreduce APIs. Existing applications that use MapReduce APIs are source-compatible and can run on YARN either with no changes, with simple recompilation against MRv2 .jar files that are shipped with Hadoop 2, or with minor updates.

Compatibility of Command-line Scripts

Most of the command line scripts from Hadoop 1.x should simply just work. The only exception is MRAdmin, which was removed from MRv2 because JobTracker and TaskTracker no longer exist. The MRAdmin functionality is now replaced with RMAdmin. The suggested method to invoke MRAdmin (also RMAdmin) is through the command line, even though one can directly invoke the APIs. In YARN, when MRAdmin commands are executed, warning messages will appear reminding users to use YARN commands (i.e., RMAdmin commands). On the other hand, if applications programmatically invoke MRAdmin, they will break when running on YARN. There is no support for either binary or source compatibility.

Compatibility Trade-off Between MRv1 and Early MRv2 (0.23.x) Applications

Unfortunately, there are some APIs that are compatible with either MRv1 applications, or with early MRv2 applications (in particular, the applications compiled against Hadoop 0.23), but not both. Some of the APIs were exactly the same in both MRv1 and MRv2 except for the return type change in method signatures. Therefore, it was necessary to make compatibility trade-offs the between the two.

  • The mapred APIs are compatible with MRv1 applications, which have a larger user base.

  • If mapreduce APIs didn’t significantly break Hadoop 0.23 applications, they were made to be compatible with 0.23, but only source compatible with 1.x.

The following table lists the APIs that are incompatible with Hadoop 0.23. If early Hadoop 2 adopters using 0.23.x used the following methods in their custom routines, they will need to modify the code accordingly. For some problematic methods, an alternative method is provided with the same functionality and a method signature similar to MRv2  applications.

MRv2 Incompatible APIs

Problematic Method – org.apache.hadoop

Incompatible Return Type Change

Alternative Method
util.ProgramDriver#drive void -> int run
mapred.jobcontrol.Job#getMapredJobID String -> JobID getMapredJobId
mapred.TaskReport#getTaskId String -> TaskID getTaskID
mapred.ClusterStatus #UNINITIALIZED_MEMORY_VALUE long -> int N/A
mapreduce.filecache.DistributedCache #getArchiveTimestamps long[] -> String[] N/A
mapreduce.filecache.DistributedCache #getFileTimestamps long[] -> String[] N/A
mapreduce.Job#failTask void -> boolean killTask(TaskAttemptID, boolean)
mapreduce.Job#killTask void -> boolean killTask(TaskAttemptID, boolean)
mapreduce.Job#getTaskCompletionEvents mapred.TaskCompletionEvent[] -> mapreduce.TaskCompletionEvent[] N/A