3.1.4. Calculating the Capacity of a Node

Because YARN has now removed the hard partitioned mapper and reducer slots of Hadoop Version 1, new capacity calculations are required. There are eight important parameters for calculating a node’s capacity that are specified in mapred-site.xml and yarn-site.xml:

In mapred-site.xml:

  • mapreduce.map.memory.mb 
    mapreduce.reduce.memory.mb

    These are the hard limits enforced by Hadoop on each mapper or reducer task.

  • mapreduce.map.java.opts
    mapreduce.reduce.java.opts

    The heapsize of the jvm –Xmx for the mapper or reducer task. Remember to leave room for the JVM Perm Gen and Native Libs used. This value should always be lower than mapreduce.[map|reduce].memory.mb.

In yarn-site.xml:

  • yarn.scheduler.minimum-allocation-mb     

    The smallest container that YARN will allow.

  • yarn.scheduler.maximum-allocation-mb     

    The largest container that YARN will allow.

  • yarn.nodemanager.resource.memory-mb    

    The amount of physical memory (RAM) for Containers on the compute node. It is important that this is not equal to the total amount of RAM on the node, as other Hadoop services also require RAM.

  • yarn.nodemanager.vmem-pmem-ratio        

    The amount of virtual memory that each Container is allowed. This can be calculated with:

    containerMemoryRequest*vmem-pmem-ratio

As an example, consider a configuration with the settings in the following table.

Example YARN MapReduce Settings

Property Value
mapreduce.map.memory.mb 1536
mapreduce.reduce.memory.mb 2560
mapreduce.map.java.opts -Xmx1024m
mapreduce.reduce.java.opts -Xmx2048m
yarn.scheduler.minimum-allocation-mb 512
yarn.scheduler.maximum-allocation-mb 4096
yarn.nodemanager.resource.memory-mb 36864
yarn.nodemanager.vmem-pmem-ratio 2.1

With these settings, each map and reduce task has a generous 512MB of overhead for the Container, as evidenced by the difference between the mapreduce.[map|reduce].memory.mb and the mapreduce.[map|reduce].java.opts

Next, YARN has been configured to allow a Container no smaller than 512MB and no larger than 4GB. The compute nodes have 36GB of RAM available for Containers. With a virtual memory ratio of 2.1 (the default value), each map can have up to 3225.6MB of RAM, or a reducer can have 5376MB of virtual RAM.

This means that the compute node configured for 36GB of Container space can support up to 24 maps or 14 reducers, or any combination of mappers and reducers allowed by the available resources on the node.

For more information about calculating memory settings, see Determine YARN and MapReduce Memory Configuration Settings.