Properties for configuring centralized caching
You must enable JNI to use centralized caching. In addition, you must configure various properties and consider the locked memory limit for configuring centralized caching.
Native libraries
In order to lock block files into memory, the DataNode relies on native JNI code found
in libhadoop.so
. Be sure to enable JNI if you are using HDFS
centralized cache management.
Configuration properties
Configuration properties for centralized caching are specified in the
hdfs-site.xml
file.
Required properties
Only the following property is required:
-
dfs.datanode.max.locked.memory
This property determines the maximum amount of memory (in bytes) that a DataNode will use for caching. The "locked-in-memory size" ulimit (ulimit -l
) of the DataNode user also needs to be increased to exceed this parameter (for more details, see the following section on ). When setting this value, remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps, and the operating system page cache. Example:<property> <name>dfs.datanode.max.locked.memory</name> <value>268435456</value> </property>
Optional properties
The following properties are not required, but can be specified for tuning.
-
dfs.namenode.path.based.cache.refresh.interval.ms
The NameNode will use this value as the number of milliseconds between subsequent cache path re-scans. By default, this parameter is set to 300000, which is five minutes. Example:<property> <name>dfs.namenode.path.based.cache.refresh.interval.ms</name> <value>300000</value> </property>
-
dfs.time.between.resending.caching.directives.ms
The NameNode will use this value as the number of milliseconds between resending caching directives. Example:<property> <name>dfs.time.between.resending.caching.directives.ms</name> <value>300000</value> </property>
-
dfs.datanode.fsdatasetcache.max.threads.per.volume
The DataNode will use this value as the maximum number of threads per volume to use for caching new data. By default, this parameter is set to 4. Example:<property> <name>dfs.datanode.fsdatasetcache.max.threads.per.volume</name> <value>4</value> </property>
-
dfs.cachereport.intervalMsec
The DataNode will use this value as the number of milliseconds between sending a full report of its cache state to the NameNode. By default, this parameter is set to 10000, which is 10 seconds. Example:<property> <name>dfs.cachereport.intervalMsec</name> <value>10000</value> </property>
-
dfs.namenode.path.based.cache.block.map.allocation.percent
The percentage of the Java heap that will be allocated to the cached blocks map. The cached blocks map is a hash map that uses chained hashing. Smaller maps may be accessed more slowly if the number of cached blocks is large. Larger maps will consume more memory. The default value is 0.25 percent. Example:<property> <name>dfs.namenode.path.based.cache.block.map.allocation.percent</name> <value>0.25</value> </property>
OS limits
If you get the error "Cannot start datanode because the configured max locked memory
size...is more than the datanode's available RLIMIT_MEMLOCK ulimit," this means that the
operating system is imposing a lower limit on the amount of memory that you can lock
than what you have configured. To fix this, you must adjust the ulimit
-l
value that the DataNode runs with. This value is usually configured in
/etc/security/limits.conf
, but this may vary depending on what
operating system and distribution you are using.
You have correctly configured this value when you can run ulimit - l
from the shell and get back either a higher value than what you have configured or the
string "unlimited", which indicates that there is no limit. Typically, ulimit
-l
returns the memory lock limit in kilobytes (KB), but
dfs.datanode.max.locked.memory
must be specified in bytes.
For example, if the value of dfs.datanode.max.locked.memory
is set to
128000 bytes:
<property> <name>dfs.datanode.max.locked.memory</name> <value>128000</value> </property>
Set the memlock
(max locked-in-memory address space) to a slightly
higher value. For example, to set memlock to 130 KB (130,000 bytes) for the hdfs user,
you would add the following line to /etc/security/limits.conf
.
hdfs - memlock 130