Determining HDP Memory Configuration Settings
You can use either of two methods determine YARN and MapReduce memory configuration settings:
The HDP utility script is the recommended method for calculating HDP memory configuration settings, but information about manually calculating YARN and MapReduce memory configuration settings is also provided for reference.
Running the YARN Utility Script
This section describes how to use the hdp-configuration-utils.py script to calculate YARN, MapReduce, Hive, and Tez memory allocation settings based on the node hardware specifications. The hdp-configuration-utils.py script is included in the HDP companion files. See Download Companion Files.
To run the hdp-configuration-utils.py script, execute the
following command from the folder containing the script
hdp-configuration-utils.py options
, where options are as
follows:
Table 1.5. hdp-configuration-utils.py Options
Option |
Description |
---|---|
-c CORES |
The number of cores on each host |
-m MEMORY |
The amount of memory on each host, in gigabytes |
-d DISKS |
The number of disks on each host |
-k HBASE |
"True" if HBase is installed; "False" if not |
Note | |
---|---|
Requires python26 to run. You can also use the -h or --help option to display a Help message that describes the options. |
Example
Running the following command from the hdp_manual_install_rpm_helper_files-2.5.0.0.1245 directory:
python hdp-configuration-utils.py -c 16 -m 64 -d 4 -k True
Returns:
Using cores=16 memory=64GB disks=4 hbase=True Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4 Num Container=8 Container Ram=6144MB Used Ram=48GB Unused Ram=16GB yarn.scheduler.minimum-allocation-mb=6144 yarn.scheduler.maximum-allocation-mb=49152 yarn.nodemanager.resource.memory-mb=49152 mapreduce.map.memory.mb=6144 mapreduce.map.java.opts=-Xmx4096m mapreduce.reduce.memory.mb=6144 mapreduce.reduce.java.opts=-Xmx4096m yarn.app.mapreduce.am.resource.mb=6144 yarn.app.mapreduce.am.command-opts=-Xmx4096m mapreduce.task.io.sort.mb=1792 tez.am.resource.memory.mb=6144 tez.am.launch.cmd-opts =-Xmx4096m hive.tez.container.size=6144 hive.tez.java.opts=-Xmx4096m
Calculating YARN and MapReduce Memory Requirements
This section describes how to manually configure YARN and MapReduce memory allocation settings based on the node hardware specifications.
YARN takes into account all of the available compute resources on each machine in the cluster. Based on the available resources, YARN negotiates resource requests from applications running in the cluster, such as MapReduce. YARN then provides processing capacity to each application by allocating containers. A container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements such as memory and CPU.
In an Apache Hadoop cluster, it is vital to balance the use of memory (RAM), processors (CPU cores), and disks so that processing is not constrained by any one of these cluster resources. As a general recommendation, allowing for two containers per disk and per core gives the best balance for cluster utilization.
When determining the appropriate YARN and MapReduce memory configurations for a cluster node, you should start with the available hardware resources. Specifically, note the following values on each node:
RAM (amount of memory)
CORES (number of CPU cores)
DISKS (number of disks)
The total available RAM for YARN and MapReduce should take into account the Reserved Memory. Reserved memory is the RAM needed by system processes and other Hadoop processes (such as HBase):
reserved memory = stack memory reserve + HBase memory reserve (if HBase is on the same node)
You can use the values in the following table to determine what you need for reserved memory per node:
Table 1.6. Reserved Memory Recommendations
Total Memory per Node |
Recommended Reserved System Memory |
Recommended Reserved HBase Memory |
---|---|---|
4 GB |
1 GB |
1 GB |
8 GB |
2 GB |
1 GB |
16 GB 24 GB |
2 GB 4 GB |
2 GB 4 GB |
48 GB |
6 GB |
8 GB |
64 GB |
8 GB |
8 GB |
72 GB |
8 GB |
8 GB |
96 GB 128 GB |
12 GB 24 GB |
16 GB 24 GB |
256 GB |
32 GB |
32 GB |
512 GB |
64 GB |
64 GB |
After you determine the amount of memory you need per node, you must determine the maximum number of containers allowed per node:
# of containers = min (2*CORES, 1.8*DISKS, (total available RAM) / MIN_CONTAINER_SIZE)
DISKS is the value for dfs.datanode.data.dir (number of data disks) per machine.
MIN_CONTAINER_SIZE is the minimum container size (in RAM). This value depends on the amount of RAM available; in smaller memory nodes, the minimum container size should also be smaller.
The following table provides the recommended values:
Table 1.7. Recommended Container Size Values
Total RAM per Node |
Recommended Minimum Container Size |
---|---|
Less than 4 GB |
256 MB |
Between 4 GB and 8 GB |
512 MB |
Between 8 GB and 24 GB |
1024 MB |
Above 24 GB |
2048 MB |
Finally, you must determine the amount of RAM per container:
RAM per container = max(MIN_CONTAINER_SIZE, (total available RAM, per containers)
Using the results of all the previous calculations, you can configure YARN and MapReduce.
Table 1.8. YARN and MapReduce Configuration Values
Configuration File |
Configuration Setting |
Value Calculation |
---|---|---|
yarn-site.xml |
yarn.nodemanager.resource.memory-mb |
= containers * RAM-per-container |
yarn-site.xml |
yarn.scheduler.minimum-allocation-mb |
= RAM-per-container |
yarn-site.xml |
yarn.scheduler.maximum-allocation-mb |
= containers * RAM-per-container |
mapred-site.xml |
mapreduce.map.memory.mb |
= RAM-per-container |
mapred-site.xml |
mapreduce.reduce.memory.mb |
= 2 * RAM-per-container |
mapred-site.xml |
mapreduce.map.java.opts |
= 0.8 * RAM-per-container |
mapred-site.xml |
mapreduce.reduce.java.opts |
= 0.8 * 2 * RAM-per-container |
mapred-site.xml |
yarn.app.mapreduce.am.resource.mb |
= 2 * RAM-per-container |
mapred-site.xml |
yarn.app.mapreduce.am.command-opts |
= 0.8 * 2 * RAM-per-container |
Note: After installation, both yarn-site.xml and
mapred-site.xml are located in the /etc/hadoop/conf
folder.
Examples
Assume that your cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks:
Reserved memory = 6 GB system memory reserve + 8 GB for HBase min container size = 2 GB
If there is no HBase, then you can use the following calculation:
# of containers = min (2*12, 1.8* 12, (48-6)/2) = min (24, 21.6, 21) = 21
RAM-per-container = max (2, (48-6)/21) = max (2, 2) = 2
Table 1.9. Example Value Calculations Without HBase
Configuration |
Value Calculation |
---|---|
yarn.nodemanager.resource.memory-mb |
= 21 * 2 = 42*1024 MB |
yarn.scheduler.minimum-allocation-mb |
= 2*1024 MB |
yarn.scheduler.maximum-allocation-mb |
= 21 * 2 = 42*1024 MB |
mapreduce.map.memory.mb |
= 2*1024 MB |
mapreduce.reduce.memory.mb |
= 2 * 2 = 4*1024 MB |
mapreduce.map.java.opts |
= 0.8 * 2 = 1.6*1024 MB |
mapreduce.reduce.java.opts |
= 0.8 * 2 * 2 = 3.2*1024 MB |
yarn.app.mapreduce.am.resource.mb |
= 2 * 2 = 4*1024 MB |
yarn.app.mapreduce.am.command-opts |
= 0.8 * 2 * 2 = 3.2*1024 MB |
If HBase is included:
# of containers = min (2*12, 1.8* 12, (48-6-8)/2) = min (24, 21.6, 17) = 17
RAM-per-container = max (2, (48-6-8)/17) = max (2, 2) = 2
Table 1.10. Example Value Calculations with HBase
Configuration |
Value Calculation |
---|---|
yarn.nodemanager.resource.memory-mb |
= 17 * 2 = 34*1024 MB |
yarn.scheduler.minimum-allocation-mb |
= 2*1024 MB |
yarn.scheduler.maximum-allocation-mb |
= 17 * 2 = 34*1024 MB |
mapreduce.map.memory.mb |
= 2*1024 MB |
mapreduce.reduce.memory.mb |
= 2 * 2 = 4*1024 MB |
mapreduce.map.java.opts |
= 0.8 * 2 = 1.6*1024 MB |
mapreduce.reduce.java.opts |
= 0.8 * 2 * 2 = 3.2*1024 MB |
yarn.app.mapreduce.am.resource.mb |
= 2 * 2 = 4*1024 MB |
yarn.app.mapreduce.am.command-opts |
= 0.8 * 2 * 2 = 3.2*1024 MB |
Notes:
Updating values for yarn.scheduler.minimum-allocation-mb without also changing yarn.nodemanager.resource.memory-mb, or changing yarn.nodemanager.resource.memory-mb without also changing yarn.scheduler.minimum-allocation-mb changes the number of containers per node.
If your installation has a large amount of RAM but not many disks or cores, you can free RAM for other tasks by lowering both yarn.scheduler.minimum-allocation-mb and yarn.nodemanager.resource.memory-mb.
With MapReduce on YARN, there are no longer preconfigured static slots for Map and Reduce tasks.
The entire cluster is available for dynamic resource allocation of Map and Reduce tasks as needed by each job. In the previous example cluster, with the previous configurations, YARN is able to allocate up to 10 Mappers (40/4) or 5 Reducers (40/8) on each node (or some other combination of Mappers and Reducers within the 40 GB per node limit).