Solr and HDFS - the block cache
Cloudera Search enables Solr to store indexes in an HDFS filesystem. To maintain performance, an HDFS block cache has been implemented using Least Recently Used (LRU) semantics. This enables Solr to cache HDFS index files on read and write, storing the portions of the file in JVM direct memory (off heap) by default, or optionally in the JVM heap.
Cloudera Search enables Solr to store indexes in an HDFS filesystem. To maintain performance, an HDFS block cache has been implemented using Least Recently Used (LRU) semantics. This enables Solr to cache HDFS index files on read and write, storing the portions of the file in JVM direct memory (off heap) by default, or optionally in the JVM heap.
Batch jobs typically do not use the cache, while Solr servers (when serving queries or indexing documents) should. When running indexing using MapReduce (MR), the MR jobs themselves do not use the block cache. Block write caching is turned off by default and should be left disabled.
solr.hdfs.blockcache.slab.count
. As index sizes grow you may need to tune
this parameter to maintain optimal performance.Configure Index Caching
The following parameters control caching. They can be configured at the Solr process level by
setting the respective Java system property or by editing solrconfig.xml
directly.
If the parameters are set at the collection level (using solrconfig.xml
), the
first collection loaded by the Solr server takes precedence, and block cache settings in all
other collections are ignored. Because you cannot control the order in which collections are
loaded, you must make sure to set identical block cache settings in every collection
solrconfig.xml
. Block cache parameters set at the collection level in
solrconfig.xml
also take precedence over parameters at the process level.
Parameter | Cloudera Manager Setting | Default | Description |
---|---|---|---|
solr.hdfs.blockcache.enabled |
HDFS Block Cache | true | Enable the block cache. |
solr.hdfs.blockcache.read.enabled |
Not directly configurable. If the block cache is enabled, Cloudera Manager automatically enables the read cache. To override this setting, you must use the Solr Service Environment Advanced Configuration Snippet (Safety Valve). | true | Enable the read cache. |
solr.hdfs.blockcache.blocksperbank |
HDFS Block Cache Blocks per Slab | 16384 | Number of blocks per cache slab. The size of the cache is 8 KB (the block size) times the number of blocks per slab times the number of slabs. |
solr.hdfs.blockcache.slab.count |
HDFS Block Cache Number of Slabs | 1 | Number of slabs per block cache. The size of the cache is 8 KB (the block size) times the number of blocks per slab times the number of slabs. |
- Go to the Solr service.
- Click the Configuration tab.
- In the Search box, type
- HDFS Block Cache
- to toggle the value of
solr.hdfs.blockcache.enabled
and enable or disable the block cache. - HDFS Block Cache Blocks per Slab
- to configure
solr.hdfs.blockcache.blocksperbank
and set the number of blocks per cache slab. - HDFS Block Cache Number of Slabs
- to configure
solr.hdfs.blockcache.slab.count
and set the number of slabs per block cache.
- Set the new parameter value.
- Restart Solr servers after editing the parameter.
Solr HDFS optimizes caching when performing NRT indexing using Lucene's
NRTCachingDirectory
.
- The segment is the result of a flush or a merge and the estimated size
of the merged segment is <=
solr.hdfs.nrtcachingdirectory.maxmergesizemb
. - The total cached bytes is <=
solr.hdfs.nrtcachingdirectory.maxcachedmb
.
Parameter | Default | Description |
---|---|---|
solr.hdfs.nrtcachingdirectory.enable |
true | Whether to enable the NRTCachingDirectory. |
solr.hdfs.nrtcachingdirectory.maxcachedmb |
192 | Size of the cache in megabytes. |
solr.hdfs.nrtcachingdirectory.maxmergesizemb |
16 | Maximum segment size to cache. |
solrconfig.xml
file with
defaults: <directoryFactory name="DirectoryFactory">
<bool name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
<int name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:true}</bool>
<int name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int>
<bool name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
</directoryFactory>