Managing and Monitoring a Cluster
Also available as:
PDF
loading table of contents...

Tune HDFS garbage collection

The Concurrent Mark Sweep (CMS) garbage collection (GC) process includes a set of heuristic rules used to trigger garbage collection. This makes garbage collection less predictable and tends to delay collection until capacity is reached, creating a Full GC error (which might pause all processes). Ambari sets default parameter values for many properties during cluster deployment. Within the export HADOOP_NameNode_Opts= clause of the hadoop-env template, two parameters that affect the CMS GC process have the following default settings:
  • -XX:+UseCMSInitiatingOccupancyOnly

    prevents the use of GC heuristics.

  • -XX:CMSInitiatingOccupancyFraction=<percent>

    tells the Java VM when the CMS collector should be triggered.

If this percent is set too low, the CMS collector runs too often; if it is set too high, the CMS collector is triggered too late, and concurrent mode failure might occur. The default setting for -XX:CMSInitiatingOccupancyFraction is 70, which means that the application should utilize less than 70% of capacity. To tune garbage collection by modifying the NameNode CMS GC parameters, follow these steps:

  1. In Ambari Web, browse to Services > HDFS.
  2. Open the Configs tab and browse to Advanced > Advanced hadoop-env.
  3. Edit the hadoop-env template.
  4. Save your configurations and restart, as prompted.