11. Configuring NameNode Heap Size

NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. The following table provides recommendations for NameNode heap size configuration. These settings should work for typical Hadoop clusters where number of blocks is very close to number of files (generally the average ratio of number of blocks per file in a system is 1.1 to 1.2). Some clusters may require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.

 

Table 1.11. NameNode Heap Size Settings

Number of files in millions

Total java heap (Xmx and Xms)

Young genaration size (-XX:NewSize -XX:MaxNewSize)

< 1 million files

1024m

128m

1-5 million files

3072m

512m

5-10

5376m

768m

10-20

9984m

1280m

20-30     

14848m

2048m

30-40

19456m

2560m

40-50

24320m

3072m

50-70

33536m

4352m

70-100

47872m

6144m

70-125

59648m

7680m

100-150

71424m

8960m

150-200

94976m

8960m


You should also set -XX:PermSize to 128m and -XX:MaxPermSize to 256m.

The following are the recommended settings for HADOOP_NAMENODE_OPTS in the hadoop-env.sh file (replace the ##### placeholder for -XX:NewSize, -XX:MaxNewSize, -Xms, and -Xmx with the recommended values from the table):

-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=##### -XX:MaxNewSize=##### -Xms##### -Xmx##### -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}

If the cluster uses a Secondary NameNode, you should also set HADOOP_SECONDARYNAMENODE_OPTS to HADOOP_NAMENODE_OPTS in the hadoop-env.sh file:

HADOOP_SECONDARYNAMENODE_OPTS=$HADOOP_NAMENODE_OPTS

Another useful HADOOP_NAMENODE_OPTS setting is -XX:+HeapDumpOnOutOfMemoryError. This option specifies that a heap dump should be executed when an out of memory error occurs. You should also use -XX:HeapDumpPath to specify the location for the heap dump file. For example:

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./etc/heapdump.hprof

loading table of contents...