MapReduce was the original use case on which Hadoop was founded. In order to graph the World Wide Web and how it changes over time, MapReduce was developed to process this graph with billions of nodes and trillions of edges. Moving this technology to YARN made it a complex application to build due to requirements for data-locality, fault tolerance, and application priorities.
In order to provide data locality, the MapReduce Application Master is required to locate blocks for processing, and then request Containers on these blocks. To implement fault-tolerance, the ability to handle failed map or reduce tasks and request them again on other nodes was needed. Fault-tolerance moved hand-in-hand with the complex intra-application priorities.
The logic to handle complex intra-application priorities for map and reduce tasks had to be built as part of the Application Master. There was no need to start idle reducers before mappers finished processing enough data. Reducers were now under control of the Application Master and were not fixed as in Hadoop Version 1. One rather unique failure mode occurs when a node fails after all the maps have finished. When this happens, the map task must be repeated because the results are unavailable. In many cases all available Containers are being used by the reducer tasks, preventing the spawning of another mapper task to process the missing data. Logically this would create a deadlock with reducers waiting for missing mapper data. The MapReduce Application Master has been designed to detect this situation and, while not ideal, will kill enough reducers to free enough resources for mappers to finish processing the missing data. The killed reducers will start again, allowing the job to complete.