Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Properties for configuring centralized caching

You must enable JNI to use centralized caching. In addition, you must configure various properties and consider the locked memory limit for configuring centralized caching.

Native libraries

In order to lock block files into memory, the DataNode relies on native JNI code found in libhadoop.so. Be sure to enable JNI if you are using HDFS centralized cache management.

Configuration properties

Configuration properties for centralized caching are specified in the hdfs-site.xml file.

Required properties

Only the following property is required:

  • dfs.datanode.max.locked.memory This property determines the maximum amount of memory (in bytes) that a DataNode will use for caching. The "locked-in-memory size" ulimit (ulimit -l) of the DataNode user also needs to be increased to exceed this parameter (for more details, see the following section on ). When setting this value, remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps, and the operating system page cache. Example:

    <property>
      <name>dfs.datanode.max.locked.memory</name>
      <value>268435456</value>
    </property>

Optional Properties

The following properties are not required, but can be specified for tuning.

  • dfs.namenode.path.based.cache.refresh.interval.ms The NameNode will use this value as the number of milliseconds between subsequent cache path re-scans. By default, this parameter is set to 300000, which is five minutes. Example:

    <property>
      <name>dfs.namenode.path.based.cache.refresh.interval.ms</name>
      <value>300000</value>
    </property>
  • dfs.time.between.resending.caching.directives.ms The NameNode will use this value as the number of milliseconds between resending caching directives. Example:

    <property>
      <name>dfs.time.between.resending.caching.directives.ms</name>
      <value>300000</value>
    </property>
  • dfs.datanode.fsdatasetcache.max.threads.per.volume The DataNode will use this value as the maximum number of threads per volume to use for caching new data. By default, this parameter is set to 4. Example:

    <property>
      <name>dfs.datanode.fsdatasetcache.max.threads.per.volume</name>
      <value>4</value>
    </property>
  • dfs.cachereport.intervalMsec The DataNode will use this value as the number of milliseconds between sending a full report of its cache state to the NameNode. By default, this parameter is set to 10000, which is 10 seconds. Example:

    <property>
      <name>dfs.cachereport.intervalMsec</name>
      <value>10000</value>
    </property>
  • dfs.namenode.path.based.cache.block.map.allocation.percent The percentage of the Java heap that will be allocated to the cached blocks map. The cached blocks map is a hash map that uses chained hashing. Smaller maps may be accessed more slowly if the number of cached blocks is large. Larger maps will consume more memory. The default value is 0.25 percent. Example:

    <property>
      <name>dfs.namenode.path.based.cache.block.map.allocation.percent</name>
      <value>0.25</value>
    </property>

OS limits

If you get the error "Cannot start datanode because the configured max locked memory size...is more than the datanode's available RLIMIT_MEMLOCK ulimit," this means that the operating system is imposing a lower limit on the amount of memory that you can lock than what you have configured. To fix this, you must adjust the ulimit -l value that the DataNode runs with. This value is usually configured in /etc/security/limits.conf, but this may vary depending on what operating system and distribution you are using.

You have correctly configured this value when you can run ulimit - l from the shell and get back either a higher value than what you have configured or the string "unlimited", which indicates that there is no limit. Typically, ulimit -l returns the memory lock limit in kilobytes (KB), but dfs.datanode.max.locked.memory must be specified in bytes.

For example, if the value of dfs.datanode.max.locked.memory is set to 128000 bytes:

<property>
  <name>dfs.datanode.max.locked.memory</name>
  <value>128000</value>
</property>

Set the memlock (max locked-in-memory address space) to a slightly higher value. For example, to set memlock to 130 KB (130,000 bytes) for the hdfs user, you would add the following line to /etc/security/limits.conf.

hdfs - memlock 130