NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. The following table provides recommendations for NameNode heap size configuration. These settings should work for typical Hadoop clusters where number of blocks is very close to number of files (generally the average ratio of number of blocks per file in a system is 1.1 to 1.2). Some clusters may require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.
Table 1.11. NameNode Heap Size Settings
Number of files in millions | Total java heap (Xmx and Xms) | Young genaration size (-XX:NewSize -XX:MaxNewSize) |
---|---|---|
< 1 million files | 1024m | 128m |
1-5 million files | 3072m | 512m |
5-10 | 5376m | 768m |
10-20 | 9984m | 1280m |
20-30 | 14848m | 2048m |
30-40 | 19456m | 2560m |
40-50 | 24320m | 3072m |
50-70 | 33536m | 4352m |
70-100 | 47872m | 6144m |
70-125 | 59648m | 7680m |
100-150 | 71424m | 8960m |
150-200 | 94976m | 8960m |
You should also set -XX:PermSize to 128m and -XX:MaxPermSize to 256m.
The following are the recommended settings for HADOOP_NAMENODE_OPTS in the hadoop-env.sh file (replace the ##### placeholder for -XX:NewSize, -XX:MaxNewSize, -Xms, and -Xmx with the recommended values from the table):
-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=##### -XX:MaxNewSize=##### -Xms##### -Xmx##### -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}
If the cluster uses a Secondary NameNode, you should also set HADOOP_SECONDARYNAMENODE_OPTS to HADOOP_NAMENODE_OPTS in the hadoop-env.sh file:
HADOOP_SECONDARYNAMENODE_OPTS=$HADOOP_NAMENODE_OPTS
Another useful HADOOP_NAMENODE_OPTS setting is -XX:+HeapDumpOnOutOfMemoryError. This option specifies that a heap dump should be executed when an out of memory error occurs. You should also use -XX:HeapDumpPath to specify the location for the heap dump file. For example:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./etc/heapdump.hprof