Determining HDP Memory Configuration Settings

You can use either of two methods determine YARN and MapReduce memory configuration settings:

The HDP utility script is the recommended method for calculating HDP memory configuration settings, but information about manually calculating YARN and MapReduce memory configuration settings is also provided for reference.

Running the YARN Utility Script

This section describes how to use the yarn-utils.py script to calculate YARN, MapReduce, Hive, and Tez memory allocation settings based on the node hardware specifications. The yarn-utils.py script is included in the HDP companion files. See Download Companion Files.

To run the yarn-utils.py script, execute the following command from the folder containing the script yarn-utils.py options, where options are as follows:

Table 1.5. yarn-utils.py Options

Option	Description
-c CORES	The number of cores on each host
-m MEMORY	The amount of memory on each host, in gigabytes
-d DISKS	The number of disks on each host
-k HBASE	"True" if HBase is installed; "False" if not

	Note
Requires python26 to run. You can also use the -h or --help option to display a Help message that describes the options.

Note

Requires python26 to run.

You can also use the -h or --help option to display a Help message that describes the options.

Example

Running the following command from the hdp_manual_install_rpm_helper_files-2.6.1.0.$BUILD directory:

python yarn-utils.py -c 16 -m 64 -d 4 -k True

Returns:

Using cores=16 memory=64GB disks=4 hbase=True
Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4
Num Container=8
Container Ram=6144MB
Used Ram=48GB
Unused Ram=16GB
yarn.scheduler.minimum-allocation-mb=6144
yarn.scheduler.maximum-allocation-mb=49152
yarn.nodemanager.resource.memory-mb=49152
mapreduce.map.memory.mb=6144
mapreduce.map.java.opts=-Xmx4096m
mapreduce.reduce.memory.mb=6144
mapreduce.reduce.java.opts=-Xmx4096m
yarn.app.mapreduce.am.resource.mb=6144
yarn.app.mapreduce.am.command-opts=-Xmx4096m
mapreduce.task.io.sort.mb=1792
tez.am.resource.memory.mb=6144
tez.am.launch.cmd-opts =-Xmx4096m
hive.tez.container.size=6144
hive.tez.java.opts=-Xmx4096m

Calculating YARN and MapReduce Memory Requirements

This section describes how to manually configure YARN and MapReduce memory allocation settings based on the node hardware specifications.

YARN takes into account all of the available compute resources on each machine in the cluster. Based on the available resources, YARN negotiates resource requests from applications running in the cluster, such as MapReduce. YARN then provides processing capacity to each application by allocating containers. A container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements such as memory and CPU.

In an Apache Hadoop cluster, it is vital to balance the use of memory (RAM), processors (CPU cores), and disks so that processing is not constrained by any one of these cluster resources. As a general recommendation, allowing for two containers per disk and per core gives the best balance for cluster utilization.

When determining the appropriate YARN and MapReduce memory configurations for a cluster node, you should start with the available hardware resources. Specifically, note the following values on each node:

RAM (amount of memory)
CORES (number of CPU cores)
DISKS (number of disks)

The total available RAM for YARN and MapReduce should take into account the Reserved Memory. Reserved memory is the RAM needed by system processes and other Hadoop processes (such as HBase):

reserved memory = stack memory reserve + HBase memory reserve (if HBase is on the same node)

You can use the values in the following table to determine what you need for reserved memory per node:

Table 1.6. Reserved Memory Recommendations

Total Memory per Node	Recommended Reserved System Memory	Recommended Reserved HBase Memory
4 GB	1 GB	1 GB
8 GB	2 GB	1 GB
16 GB 24 GB	2 GB 4 GB	2 GB 4 GB
48 GB	6 GB	8 GB
64 GB	8 GB	8 GB
72 GB	8 GB	8 GB
96 GB 128 GB	12 GB 24 GB	16 GB 24 GB
256 GB	32 GB	32 GB
512 GB	64 GB	64 GB

After you determine the amount of memory you need per node, you must determine the maximum number of containers allowed per node:

# of containers = min (2*CORES, 1.8*DISKS, (total available RAM) / MIN_CONTAINER_SIZE)

DISKS is the value for dfs.datanode.data.dir (number of data disks) per machine.

MIN_CONTAINER_SIZE is the minimum container size (in RAM). This value depends on the amount of RAM available; in smaller memory nodes, the minimum container size should also be smaller.

The following table provides the recommended values:

Table 1.7. Recommended Container Size Values

Total RAM per Node	Recommended Minimum Container Size
Less than 4 GB	256 MB
Between 4 GB and 8 GB	512 MB
Between 8 GB and 24 GB	1024 MB
Above 24 GB	2048 MB

Finally, you must determine the amount of RAM per container:

RAM per container = max(MIN_CONTAINER_SIZE, (total available RAM, per containers)

Using the results of all the previous calculations, you can configure YARN and MapReduce.

Table 1.8. YARN and MapReduce Configuration Values

Configuration File	Configuration Setting	Value Calculation
yarn-site.xml	yarn.nodemanager.resource.memory-mb	= containers * RAM-per-container
yarn-site.xml	yarn.scheduler.minimum-allocation-mb	= RAM-per-container
yarn-site.xml	yarn.scheduler.maximum-allocation-mb	= containers * RAM-per-container
mapred-site.xml	mapreduce.map.memory.mb	= RAM-per-container
mapred-site.xml	mapreduce.reduce.memory.mb	= 2 * RAM-per-container
mapred-site.xml	mapreduce.map.java.opts	= 0.8 * RAM-per-container
mapred-site.xml	mapreduce.reduce.java.opts	= 0.8 * 2 * RAM-per-container
mapred-site.xml	yarn.app.mapreduce.am.resource.mb	= 2 * RAM-per-container
mapred-site.xml	yarn.app.mapreduce.am.command-opts	= 0.8 * 2 * RAM-per-container

Note: After installation, both yarn-site.xml and mapred-site.xml are located in the /etc/hadoop/conf folder.

Examples

Assume that your cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks:

Reserved memory = 6 GB system memory reserve + 8 GB for HBase min container size = 2 GB

If there is no HBase, then you can use the following calculation:

# of containers = min (2*12, 1.8* 12, (48-6)/2) = min (24, 21.6, 21) = 21

RAM-per-container = max (2, (48-6)/21) = max (2, 2) = 2

Table 1.9. Example Value Calculations Without HBase

Configuration	Value Calculation
yarn.nodemanager.resource.memory-mb	= 21 * 2 = 42*1024 MB
yarn.scheduler.minimum-allocation-mb	= 2*1024 MB
yarn.scheduler.maximum-allocation-mb	= 21 * 2 = 42*1024 MB
mapreduce.map.memory.mb	= 2*1024 MB
mapreduce.reduce.memory.mb	= 2 * 2 = 4*1024 MB
mapreduce.map.java.opts	= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts	= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb	= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts	= 0.8 * 2 * 2 = 3.2*1024 MB

If HBase is included:

# of containers = min (2*12, 1.8* 12, (48-6-8)/2) = min (24, 21.6, 17) = 17

RAM-per-container = max (2, (48-6-8)/17) = max (2, 2) = 2

Table 1.10. Example Value Calculations with HBase

Configuration	Value Calculation
yarn.nodemanager.resource.memory-mb	= 17 * 2 = 34*1024 MB
yarn.scheduler.minimum-allocation-mb	= 2*1024 MB
yarn.scheduler.maximum-allocation-mb	= 17 * 2 = 34*1024 MB
mapreduce.map.memory.mb	= 2*1024 MB
mapreduce.reduce.memory.mb	= 2 * 2 = 4*1024 MB
mapreduce.map.java.opts	= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts	= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb	= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts	= 0.8 * 2 * 2 = 3.2*1024 MB

Notes:

Updating values for yarn.scheduler.minimum-allocation-mb without also changing yarn.nodemanager.resource.memory-mb, or changing yarn.nodemanager.resource.memory-mb without also changing yarn.scheduler.minimum-allocation-mb changes the number of containers per node.
If your installation has a large amount of RAM but not many disks or cores, you can free RAM for other tasks by lowering both yarn.scheduler.minimum-allocation-mb and yarn.nodemanager.resource.memory-mb.
With MapReduce on YARN, there are no longer preconfigured static slots for Map and Reduce tasks.
The entire cluster is available for dynamic resource allocation of Map and Reduce tasks as needed by each job. In the previous example cluster, with the previous configurations, YARN is able to allocate up to 10 Mappers (40/4) or 5 Reducers (40/8) on each node (or some other combination of Mappers and Reducers within the 40 GB per node limit).