The MapReduce Application Master is implemented as a composition of loosely coupled services. The services interact with each other via events. Each component acts on the received events and sends out any required events to other components. This design keeps it highly concurrent, with minimal or no synchronization required. The events are dispatched by a central Dispatch mechanism. All components register with the Dispatcher. The information is shared across different components using AppContext.
In Hadoop 1 the death of the Job Tracker would result in the loss of all jobs -- both running and queued. With a YARN MapReduce job, the equivalent to the Job Tracker is the Application Master. Because the Application Master will now run on compute nodes, this can lead to an increase in failure scenarios. To combat MapReduce Application Master failures, YARN has the capability to restart a specified number of times, as well as the capability to recover completed tasks. Additionally, much like the Job Tracker, the Application Master keeps metrics for jobs that are currently running. Typically the Application Master tracking URL makes these available, and these metrics can be found in the YARN web UI (See the previous pi example). The following settings can enable MapReduce recovery in YARN.
Enabling Application Master Restarts
To enable Application Master restarts:
You can adjust the
yarn.resourcemanager.am.max-retries
property in theyarn-site.xml
file. The default setting is 2.You can more directly tune how many times a MapReduce Application Master should restart by adjusting the
mapreduce.am.max-attempts
property in themapred-site.xml
file. The default setting is 2.
Enabling Recovery of Completed Tasks
You can use the yarn.app.mapreduce.am.job.recovery.enable
property in the yarn-site.xml
file to enable recovery of completed
tasks. The default setting is "true".
The Job History Server
With the Application Master now taking the place of the Job Tracker, a centralized location to store the history of all MapReduce jobs was required. The Job History Server helps fill the void left by the transitory Application Master by hosting these completed job metrics and logs. This new history daemon is unrelated to the services provided by YARN, and is directly related to the MapReduce application framework.