Tuning Garbage Collection
The Concurrent Mark Sweep (CMS) garbage collection (GC) process includes a set of
heuristic rules used to trigger garbage collection. This makes garbage collection less
predictable and tends to delay collection until capacity is reached, creating a
Full GC
error (which might pause all processes).
Ambari sets default parameter values for many properties during cluster deployment. Within the export HADOOP_NameNode_Opts= clause of the hadoop-env template, two parameters that affect the CMS GC process have the following default settings:
-XX:+UseCMSInitiatingOccupancyOnly
prevents the use of GC heuristics.
-XX:CMSInitiatingOccupancyFraction=<percent>
tells the Java VM when the CMS collector should be triggered.
If this percent is set too low, the CMS collector runs too often; if it is set too high, the CMS collector is triggered too late, and concurrent mode failure might occur. The default setting for
-XX:CMSInitiatingOccupancyFraction
is 70, which means that the application should utilize less than 70% of capacity.
To tune garbage collection by modifying the NameNode CMS GC parameters, follow these steps:
In Ambari Web, browse to Services > HDFS.
Open the Configs tab and browse to Advanced > Advanced hadoop-env.
Edit the
hadoop-env
template.Save your configurations and restart, as prompted.
More Information