Solr and HDFS - the Block Cache
Cloudera Search enables Solr to store indexes in an HDFS filesystem. To maintain performance, an HDFS block cache has been implemented using Least Recently Used (LRU) semantics. This enables Solr to cache HDFS index files on read and write, storing the portions of the file in JVM direct memory (off heap) by default, or optionally in the JVM heap.
Batch jobs typically do not use the cache, while Solr servers (when serving queries or indexing documents) should. When running indexing using MapReduce (MR), the MR jobs themselves do not use the block cache. Block write caching is turned off by default and should be left disabled.
solr.hdfs.blockcache.slab.count
. As index sizes grow you may need to tune
this parameter to maintain optimal performance.Configure Index Caching
The following parameters control caching. They can be configured at the Solr process level by
setting the respective Java system property or by editing solrconfig.xml
directly.
If the parameters are set at the collection level (using solrconfig.xml
), the
first collection loaded by the Solr server takes precedence, and block cache settings in all
other collections are ignored. Because you cannot control the order in which collections are
loaded, you must make sure to set identical block cache settings in every collection
solrconfig.xml
. Block cache parameters set at the collection level in
solrconfig.xml
also take precedence over parameters at the process level.
Parameter | Cloudera Manager Setting | Default | Description |
---|---|---|---|
solr.hdfs.blockcache.global |
Not directly configurable. Cloudera Manager automatically enables the global block cache. To override this setting, you must use the Solr Service Environment Advanced Configuration Snippet (Safety Valve). | true | If enabled, one HDFS block cache is used for each collection on a host. If
blockcache.global is disabled, each SolrCore on a host creates its own
private HDFS block cache. Enabling this parameter simplifies managing HDFS block cache
memory. |
solr.hdfs.blockcache.enabled |
HDFS Block Cache | true | Enable the block cache. |
solr.hdfs.blockcache.read.enabled |
Not directly configurable. If the block cache is enabled, Cloudera Manager automatically enables the read cache. To override this setting, you must use the Solr Service Environment Advanced Configuration Snippet (Safety Valve). | true | Enable the read cache. |
solr.hdfs.blockcache.write.enabled |
Not directly configurable. If the block cache is enabled, Cloudera Manager automatically disables the write cache. | false | Enable the write cache. |
solr.hdfs.blockcache.direct.memory.allocation |
HDFS Block Cache Off-Heap Memory | true | Enable direct memory allocation. If this is false, heap is used. |
solr.hdfs.blockcache.blocksperbank |
HDFS Block Cache Blocks per Slab | 16384 | Number of blocks per cache slab. The size of the cache is 8 KB (the block size) times the number of blocks per slab times the number of slabs. |
solr.hdfs.blockcache.slab.count |
HDFS Block Cache Number of Slabs | 1 | Number of slabs per block cache. The size of the cache is 8 KB (the block size) times the number of blocks per slab times the number of slabs. |
Solr HDFS optimizes caching when performing NRT indexing using Lucene's
NRTCachingDirectory
.
- The segment is the result of a flush or a merge and the estimated size
of the merged segment is <=
solr.hdfs.nrtcachingdirectory.maxmergesizemb
. - The total cached bytes is <=
solr.hdfs.nrtcachingdirectory.maxcachedmb
.
Parameter | Default | Description |
---|---|---|
solr.hdfs.nrtcachingdirectory.enable |
true | Whether to enable the NRTCachingDirectory. |
solr.hdfs.nrtcachingdirectory.maxcachedmb |
192 | Size of the cache in megabytes. |
solr.hdfs.nrtcachingdirectory.maxmergesizemb |
16 | Maximum segment size to cache. |
solrconfig.xml
file with
defaults: <directoryFactory name="DirectoryFactory">
<bool name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
<int name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:true}</bool>
<int name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int>
<bool name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
</directoryFactory>