CDH 5 and MapReduce

CDH 5 supports two versions of the MapReduce computation framework: MRv1 and MRv2. The default installation in CDH 5 is MapReduce (MRv2) built on the YARN framework. In this document, we refer to this new MapReduce version as YARN. You can use the instructions later in this section to install:

  • YARN or
  • MapReduce (MRv1) or
  • both implementations.

MapReduce MRv2 (YARN)

The MRv2 YARN architecture splits the two primary responsibilities of the JobTracker — resource management and job scheduling/monitoring — into separate daemons: a global ResourceManager (RM) and per-application ApplicationMasters (AM). With MRv2, the ResourceManager (RM) and per-node NodeManagers (NM) form the data-computation framework. The ResourceManager service effectively replaces the functions of the JobTracker, and NodeManagers run on worker hosts instead of TaskTracker daemons. The per-application ApplicationMaster is, in effect, a framework-specific library and negotiates resources from the ResourceManager and works with the NodeManagers to execute and monitor the tasks. For details of this architecture, see Apache Hadoop NextGen MapReduce (YARN).

See also Migrating from MapReduce 1 (MRv1) to MapReduce 2 (MRv2, YARN).