Known Issues in Apache Solr
Learn about the known issues in Solr, the impact or changes to the functionality, and the workaround.
Known Issues
- Changing the default value of Client Connection Registry HBase configuration parameter causes HBase MRIT job to fail
-
If the value of the HBase configuration property
Client Connection Registry
is changed from the defaultZooKeeper Quorum
toMaster Registry
then the Yarn job started by HBase MRIT fails with a similar error message:Caused by: org.apache.hadoop.hbase.exceptions.MasterRegistryFetchException: Exception making rpc to masters [quasar-bmyccr-2.quasar-bmyccr.root.hwx.site,22001,-1] at org.apache.hadoop.hbase.client.MasterRegistry.lambda$groupCall$1(MasterRegistry.java:244) at org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:792) at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2153) at org.apache.hadoop.hbase.util.FutureUtils.addListener(FutureUtils.java:61) at org.apache.hadoop.hbase.client.MasterRegistry.groupCall(MasterRegistry.java:228) at org.apache.hadoop.hbase.client.MasterRegistry.call(MasterRegistry.java:265) at org.apache.hadoop.hbase.client.MasterRegistry.getMetaRegionLocations(MasterRegistry.java:282) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:900) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:867) at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:850) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:981) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:870) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:319) ... 21 more Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed contacting masters after 1 attempts. Exceptions: java.io.IOException: Call to address=quasar-bmyccr-2.quasar-bmyccr.root.hwx.site/172.27.19.4:22001 failed on local exception: java.io.IOException: java.lang.RuntimeException: Found no valid authentication method from options at org.apache.hadoop.hbase.client.MasterRegistry.lambda$groupCall$1(MasterRegistry.java:243) ... 35 more
- Apache Tika upgrade may break morphlines indexing
- The upgrade of Apache Tika from 1.27 to 2.3.0 brought potentially
breaking changes for morphlines indexing. Duplicate/triplicate keys names were removed and
certain parser class names were changed (For example,
org.apache.tika.parser.jpeg.JpegParser
changed toorg.apache.tika.parser.image.JpegParser
). - CDPD-28432: HBase Lily indexer REST port does not support SSL
- When using the
--http
argument for the hbase-indexer command line tool to invoke Lily indexer through REST API, you can add/list/remove indexers with any user without the need for authentication.
- CDH-77598: Indexing fails with socketTimeout
-
Starting from CDH 6.0, the HTTP client library used by Solr has a default socket timeout of 10 minutes. Because of this, if a single request sent from an indexer executor to Solr takes more than 10 minutes to be serviced, the indexing process fails with a timeout error.
This timeout has been raised to 24 hours. Nevertheless, there still may be use cases where even this extended timeout period proves insufficient.
- CDPD-12450: CrunchIndexerTool Indexing fails with socketTimeout
- The http client library uses a socket timeout of 10 minutes. The Spark Crunch Indexer does not override this value, and in case a single batch takes more than 10 minutes, the entire indexing job fails. This can happen especially if the morphlines contain DeleteByQuery requests.
- CDPD-29289: HBaseMapReduceIndexerTool fails with socketTimeout
- The http client library uses a socket timeout of 10 minutes. The HBase Indexer does not override this value, and in case a single batch takes more than 10 minutes, the entire indexing job fails.
- CDPD-20577: Splitshard operation on HDFS index checks local filesystem and fails
-
When performing a shard split on an index that is stored on HDFS,
SplitShardCmd
still evaluates free disk space on the local file system of the server where Solr is installed. This may cause the command to fail, perceiving that there is no adequate disk space to perform the shard split.
- DOCS-5717: Lucene index handling limitation
- The Lucene index can only be upgraded by one major version. Solr 8 will not open an index that was created with Solr 6 or earlier.
- CDH-22190: CrunchIndexerTool which includes Spark indexer requires specific input file format specifications
- If the
--input-file-format
option is specified with CrunchIndexerTool, then its argument must betext
,avro
, oravroParquet
, rather than a fully qualified class name.
- CDH-26856: Field value class guessing and Automatic schema field addition are not supported with the MapReduceIndexerTool nor with the HBaseMapReduceIndexerTool.
- The MapReduceIndexerTool and the HBaseMapReduceIndexerTool can be used with a Managed Schema created via NRT indexing of documents or via the Solr Schema API. However, neither tool supports adding fields automatically to the schema during ingest.
- CDH-19407: The Browse and Spell Request Handlers are not enabled in schemaless mode
- The Browse and Spell Request Handlers require certain fields to be present in the schema. Since those fields cannot be guaranteed to exist in a Schemaless setup, the Browse and Spell Request Handlers are not enabled by default.
- CDH-17978: Enabling blockcache writing may result in unusable indexes.
- It is possible to create indexes with
solr.hdfs.blockcache.write.enabled
set totrue
. Such indexes may appear corrupt to readers, and reading these indexes may irrecoverably corrupt indexes. Blockcache writing is disabled by default.
- CDH-58276: Users with insufficient Solr permissions may receive a "Page Loading" message from the Solr Web Admin UI.
- Users who are not authorized to use the Solr Admin UI are not given a page explaining that access is denied to them, instead receive a web page that never finishes loading.
- CDH-15441: Using MapReduceIndexerTool or HBaseMapReduceIndexerTool multiple times may produce duplicate entries in a collection.
- Repeatedly running the MapReduceIndexerTool on the same set of input files can result in duplicate entries in the Solr collection. This occurs because the tool can only insert documents and cannot update or delete existing Solr documents. This issue does not apply to the HBaseMapReduceIndexerTool unless it is run with more than zero reducers.
- CDH-58694: Deleting collections might fail if hosts are unavailable.
- It is possible to delete a collection when hosts that host some of the collection are unavailable. After such a deletion, if the previously unavailable hosts are brought back online, the deleted collection may be restored.
Unsupported Features
- Package Management System
- HTTP/2
- Solr SQL/JDBC
- Graph Traversal
- Cross Data Center Replication (CDCR)
- SolrCloud Autoscaling
- HDFS Federation
- Saving search results
- Solr contrib modules (Spark, MapReduce and Lily HBase indexers are not contrib modules but part of Cloudera's distribution of Solr itself, therefore they are supported).