Solr server tuning categories

Solr performance tuning is a complex task. Learn about available tuning options that youcan perform either during deployment or at a later stage.

The following tuning categories can be completed either during deployment or at a later stage. It is less important to implement these changes before taking your system into use.

Table 1. Tuning steps
Tuning option Description
Configure the Java heap size

Set the Java heap size for the Solr Server to at least 16 GB for production environments. For more information on memory requirements, see Deployment Planning for Cloudera Search.

Enable multi-threaded faceting Enabling multi-threaded faceting can provide better performance for field faceting. It has no effect on query faceting.
Enable garbage collector (GC) logging To help identify any garbage collector (GC) issues, enable GC logging in production. The overhead is low.
Configure garbage collection Select the garbage collection option that offers best performance in your environment.
Configure index caching Cloudera Search enables Solr to store indexes in an HDFS filesystem. To maintain performance, an HDFS block cache has been implemented using Least Recently Used (LRU) semantics. This enables Solr to cache HDFS index files on read and write, storing the portions of the file in JVM direct memory (off heap) by default, or optionally in the JVM heap.
Tune commit values
Changing commit values may improve performance in certain situations. These changes result in tradeoffs and may not be beneficial in all cases.
  • For hard commit values, the default value of 60000 (60 seconds) is typically effective, though changing this value to 120 seconds may improve performance in some cases. Note, that setting this value to higher values, such as 600 seconds may result in undesirable performance tradeoffs.
  • Consider increasing the auto-soft-commit value from 15000 (15 seconds) to 120000 (120 seconds). You may increase this to the largest value that still meets your requirements.
Tune sharding

In some cases, oversharding can help improve performance including intake speed. If your environment includes massively parallel hardware and you want to use these available resources, consider oversharding. You might increase the number of replicas per host from 1 to 2 or 3. Making such changes creates complex interactions, so you should continue to monitor your system's performance to ensure that the benefits of oversharding outweigh the costs.

Minimize swappiness For better performance, Cloudera recommends setting the Linux swap space on all Solr server hosts as shown below:
sudo sysctl vm.swappiness=1
Consider collection aliasing to deal with massive amounts of timestamped data in streaming-style applications If you need to index and near real time query huge amounts of timestamped data in Solr, such as logs or IoT sensor data, you may consider aliasing as a massively scalable solution. This approach allows for indefinite indexing of data without degradation of performance otherwise experienced due to the continuous growth of a single index.

Additional tuning resources

Practical tuning tips outside the Cloudera Search documentation: