loading table of contents...

12.2. Tuning Garbage Collection

The Concurrent Mark-Sweep (CMS) Garbage Collectoion (GC) process includes a set of heuristic rules used to trigger garbage collection. This makes garbage collection less predictable and tends to delay collection until the old generation is almost fully occupied. Initiating it in advance allows garbage collection to complete before the old generation is full, and thus avoids Full GC (which may result in "stop-the-world" pause behavior).

Ambari sets default parameter values for many properties during cluster deployment. Within the export HADOOP_NameNode_Opts= clause of the hadoop-env template, two parameters that affect the CMS GC process have the following default settings:

  • -XX:+UseCMSInitiatingOccupancyOnly prevents the use of GC heuristics.

  • -XX:CMSInitiatingOccupancyFraction=<percent> tells the Java VM when CMS should be triggered. Basically, it allows the creation of a buffer in heap, which can be filled with data while CMS is running. This percent should be back-calculated from the speed with which memory is consumed in the old generation during production load. If this percent is set too low, the CMS will run too often; if it is set too high, the CMS will be triggered too late and concurrent mode failure may occur. The default setting for -XX:CMSInitiatingOccupancyFraction is 70, which means that the application should utilize less than 70% of the old generation.

To modify the NameNode CMS GC parameters:

  1. Using Ambari Web, browse to Services > HDFS.

  2. Open the Configs tab and browse to Advanced > Advanced hadoop-env.

  3. Edit the hadoop-env template.

  4. Save your configurations and restart, as prompted.