Solr Server Tuning Categories
Solr performance tuning is a complex task. This is an overview of available tuning options that may be performed either during deployment or at a later stage.
Tuning to Complete During Setup
Some tuning is best completed during the setup of your system as it may require re-indexing.
Tuning option | Description |
---|---|
Configure Lucene version | You can configure Solr to use a specific version of Lucene. This can help ensure that the Lucene version that Search uses includes the latest features and bug fixes. |
Design a schema | When constructing a schema, use data types that most accurately describe the data that the fields will contain. For more information on schemaless and non-schemaless mode, see Deployment Planning for Cloudera Search. |
Configure the Java heap size |
Set the Java heap size for the Solr Server to at least 16 GB for production environments. For more information on memory requirements, see Deployment Planning for Cloudera Search. |
General Tuning
The following tuning categories can be completed either during deployment or at a later stage. It is less important to implement these changes before taking your system into use.
Tuning option | Description |
---|---|
Enable multi-threaded faceting | Enabling multi-threaded faceting can provide better performance for field faceting. It has no effect on query faceting. |
Consider changing batchSize setting if you work with large
documents |
In most cases, do not change the default batchSize
setting
of 1000. If you are working with especially large documents, you may consider decreasing the
batch size. |
Enable garbage collector (GC) logging | To help identify any garbage collector (GC) issues, enable GC logging in production. The overhead is low and the JVM supports GC log rolling as of 1.6.0_34. |
Configure garbage collection | Select the garbage collection option that offers best performance in your environment. |
Configure index caching | Cloudera Search enables Solr to store indexes in an HDFS filesystem. To maintain performance, an HDFS block cache has been implemented using Least Recently Used (LRU) semantics. This enables Solr to cache HDFS index files on read and write, storing the portions of the file in JVM direct memory (off heap) by default, or optionally in the JVM heap. |
Tune commit values |
Changing commit values may improve performance in certain situations. These changes
result in tradeoffs and may not be beneficial in all cases.
|
Tune sharding |
In some cases, oversharding can help improve performance including intake speed. If your environment includes massively parallel hardware and you want to use these available resources, consider oversharding. You might increase the number of replicas per host from 1 to 2 or 3. Making such changes creates complex interactions, so you should continue to monitor your system's performance to ensure that the benefits of oversharding outweigh the costs. |
Minimize swappiness | For better performance, Cloudera recommends setting the Linux swap space on all Solr
server hosts as shown below: sudo sysctl vm.swappiness=1 |
Consider collection aliasing to deal with massive amounts of timestamped data in streaming-style applications | If you need to index and near real time query huge amounts of timestamped data in Solr, such as logs or IoT sensor data, you may consider aliasing as a massively scalable solution. This approach allows for indefinite indexing of data without degradation of performance otherwise experienced due to the continuous growth of a single index. |
Additional Tuning Resources
Practical tuning tips outside the Cloudera Search documentation:
- For information on memory tuning, see Part 1 and Part 2 of Apache Solr Memory Tuning for Production on Cloudera Blog.
- General information on Solr caching is available under Query Settings in SolrConfig in the Apache Solr Reference Guide.
- Information on issues that influence performance is available on the SolrPerformanceFactors page on the Solr Wiki.
- Resource Management describes how to use Cloudera Manager to manage resources, for example with Linux cgroups.
- For information on improving querying performance, see How to make searching faster.
- For information on improving indexing performance, see How to make indexing faster.
- For information on aliasing, see Collection Aliasing: Near Real-Time Search for Really Big Data on Cloudera Blog and Time Routed Aliases in the Apache Solr Reference Guide.