To ease the transition from Hadoop Version 1 to YARN, a major goal of YARN and the MapReduce framework implementation on top of YARN was to ensure that existing MapReduce applications that were programmed and compiled against previous MapReduce APIs (we’ll call these MRv1 applications) can continue to run with little or no modification on YARN (we’ll refer to these as MRv2 applications).
Binary Compatibility of org.apache.hadoop.mapred APIs
For the vast majority of users who use the
org.apache.hadoop.mapred
APIs, MapReduce on YARN ensures full
binary compatibility. These existing applications can run on YARN directly
without recompilation. You can use .jar files from your existing application
that code against mapred APIs, and use bin/hadoop to submit them directly to
YARN.
Source Compatibility of org.apache.hadoop.mapreduce APIs
Unfortunately, it was difficult to ensure full binary compatibility to the
existing applications that compiled against MRv1
org.apache.hadoop.mapreduce
APIs. These APIs have gone through
many changes. For example, several classes stopped being abstract classes and
changed to interfaces. Therefore, the YARN community compromised by only
supporting source compatibility for org.apache.hadoop.mapreduce
APIs. Existing applications that use MapReduce APIs are source-compatible and
can run on YARN either with no changes, with simple recompilation against MRv2
.jar files that are shipped with Hadoop 2, or with minor updates.
Compatibility of Command-line Scripts
Most of the command line scripts from Hadoop 1.x should simply just work. The only exception is MRAdmin, which was removed from MRv2 because JobTracker and TaskTracker no longer exist. The MRAdmin functionality is now replaced with RMAdmin. The suggested method to invoke MRAdmin (also RMAdmin) is through the command line, even though one can directly invoke the APIs. In YARN, when MRAdmin commands are executed, warning messages will appear reminding users to use YARN commands (i.e., RMAdmin commands). On the other hand, if applications programmatically invoke MRAdmin, they will break when running on YARN. There is no support for either binary or source compatibility.
Compatibility Trade-off Between MRv1 and Early MRv2 (0.23.x) Applications
Unfortunately, there are some APIs that are compatible with either MRv1 applications, or with early MRv2 applications (in particular, the applications compiled against Hadoop 0.23), but not both. Some of the APIs were exactly the same in both MRv1 and MRv2 except for the return type change in method signatures. Therefore, it was necessary to make compatibility trade-offs the between the two.
The mapred APIs are compatible with MRv1 applications, which have a larger user base.
If mapreduce APIs didn’t significantly break Hadoop 0.23 applications, they were made to be compatible with 0.23, but only source compatible with 1.x.
The following table lists the APIs that are incompatible with Hadoop 0.23. If early Hadoop 2 adopters using 0.23.x used the following methods in their custom routines, they will need to modify the code accordingly. For some problematic methods, an alternative method is provided with the same functionality and a method signature similar to MRv2 applications.
MRv2 Incompatible APIs
Problematic Method – org.apache.hadoop |
Incompatible Return Type Change |
Alternative Method |
util.ProgramDriver#drive | void -> int | run |
mapred.jobcontrol.Job#getMapredJobID | String -> JobID | getMapredJobId |
mapred.TaskReport#getTaskId | String -> TaskID | getTaskID |
mapred.ClusterStatus #UNINITIALIZED_MEMORY_VALUE | long -> int | N/A |
mapreduce.filecache.DistributedCache #getArchiveTimestamps | long[] -> String[] | N/A |
mapreduce.filecache.DistributedCache #getFileTimestamps | long[] -> String[] | N/A |
mapreduce.Job#failTask | void -> boolean | killTask(TaskAttemptID, boolean) |
mapreduce.Job#killTask | void -> boolean | killTask(TaskAttemptID, boolean) |
mapreduce.Job#getTaskCompletionEvents | mapred.TaskCompletionEvent[] -> mapreduce.TaskCompletionEvent[] | N/A |