Because YARN has now removed the hard partitioned mapper and reducer slots
of Hadoop Version 1, new capacity calculations are required. There are eight
important parameters for calculating a node’s capacity that are specified in
mapred-site.xml
and yarn-site.xml
:
In mapred-site.xml:
mapreduce.map.memory.mb mapreduce.reduce.memory.mb
These are the hard limits enforced by Hadoop on each mapper or reducer task.
mapreduce.map.java.opts mapreduce.reduce.java.opts
The heapsize of the jvm –Xmx for the mapper or reducer task. Remember to leave room for the JVM Perm Gen and Native Libs used. This value should always be lower than
mapreduce.[map|reduce].memory.mb
.
In yarn-site.xml:
yarn.scheduler.minimum-allocation-mb
The smallest container that YARN will allow.
yarn.scheduler.maximum-allocation-mb
The largest container that YARN will allow.
yarn.nodemanager.resource.memory-mb
The amount of physical memory (RAM) for Containers on the compute node. It is important that this is not equal to the total amount of RAM on the node, as other Hadoop services also require RAM.
yarn.nodemanager.vmem-pmem-ratio
The amount of virtual memory that each Container is allowed. This can be calculated with:
containerMemoryRequest*vmem-pmem-ratio
As an example, consider a configuration with the settings in the following table.
Example YARN MapReduce Settings
Property | Value |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
mapreduce.map.java.opts | -Xmx1024m |
mapreduce.reduce.java.opts | -Xmx2048m |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 4096 |
yarn.nodemanager.resource.memory-mb | 36864 |
yarn.nodemanager.vmem-pmem-ratio | 2.1 |
With these settings, each map and reduce task has a generous 512MB of overhead
for the Container, as evidenced by the difference between the
mapreduce.[map|reduce].memory.mb
and the
mapreduce.[map|reduce].java.opts
.
Next, YARN has been configured to allow a Container no smaller than 512MB and no larger than 4GB. The compute nodes have 36GB of RAM available for Containers. With a virtual memory ratio of 2.1 (the default value), each map can have up to 3225.6MB of RAM, or a reducer can have 5376MB of virtual RAM.
This means that the compute node configured for 36GB of Container space can support up to 24 maps or 14 reducers, or any combination of mappers and reducers allowed by the available resources on the node.
For more information about calculating memory settings, see Determine YARN and MapReduce Memory Configuration Settings.