Known Issues in Cloudera Search
Learn about the known issues in Cloudera Search, the impact or changes to the functionality, and the workaround.
Known Issues
- TSB 2022-535: Ranger audit retention settings in Solr are not honored
- The audits present in the ranger_audits collection in the Solr service of Data Lake do not get deleted based on the retention period set. The default retention period is 90 days.
This is caused by the incorrect order of processors in the configuration (solrconfig.xml) used by the ranger_audits collection.
- Impact
-
-
Older audits based on the retention period are not deleted that leads to:
-
An increase in space utilization under the Solr Data Directory. You can check the location being used for the “Solr Data Directory” setting in the Solr service configuration.
-
Increases the memory usage of Solr service
-
Backup-Restore of Solr collection would take longer
-
-
- Action required
-
-
- Workaround
-
-
Update Solr configuration and clean old audits
This is a two-step process. Note that this process must be completed for any Data Lake that has used an affected runtime version, even if the Data Lake has since been upgraded to a newer runtime.
-
Part 1: Manually correct the configurations to ensure that Solr retention value is enforced for new documents that are added.
-
Part 2: Manually cleanup documents that are older than 90 days.
-
-
-
- Upgrade (recommended)
- Upgrade to runtime 7.2.11+
If the Data Lake is not upgraded, then the manual configuration Solr changes will be overwritten if the Data Lake is repaired or upgraded to runtime lower than 7.2.11. To ensure the changes are persisted, the Data Lake must be upgraded to runtime 7.2.11 or higher. See below for additional upgrade details. Note that the manual steps in Parts 1 and 2 above still need to be done even if the Data Lake is upgraded.
-
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2021-535: Ranger audit retention settings in Solr are not honored.
- Splitshard of HDFS index checks local filesystem and fails
-
When performing a shard split on an index that is stored on HDFS,
SplitShardCmd
still evaluates free disk space on the local file system of the server where Solr is installed. This may cause the command to fail, perceiving that there is no adequate disk space to perform the shard split.
- OPSAPS-58059: Solr log rotation counts the number of retained log files daily instead of globally
- With CDP 7.1.1, Search moved to Log4Jv2. This has affeced Solr
log rotation behavior in an unwanted way. With the default configuration, Solr log file
names include a date and a running index, for example:
solr-cmf-solr-SOLR_SERVER-solrserver-1.my.corporation.com.log.out.2020-08-31-9
. The number of retained log files is configured in Cloudera Manager, however the configured number now applies for each day, instead of applying globally for all log files of the particular server.
- DOCS-5717: Lucene index handling limitation
- The Lucene index can only be upgraded by one major version. Solr 8 will not open an index that was created with Solr 6 or earlier.
- CDH-82042: Solr service with no added collections causes the upgrade process to fail
- Upgrade fails while performing the bootstrap collections step of
the
solr-upgrade.sh
script with the error message:
if there are no collections present in Solr.Failed to execute command Bootstrap Solr Collections on service Solr
- CDH-34050: Collection Creation No Longer Supports Automatically Selecting A Configuration If Only One Exists
-
Before CDH 5.5.0, a collection could be created without specifying a configuration. If no
-c
value was specified, then:-
If there was only one configuration, that configuration was chosen.
-
If the collection name matched a configuration name, that configuration was chosen.
Search now includes multiple built-in configurations. As a result, there is no longer a case in which only one configuration can be chosen by default.
-
- CDH-22190: CrunchIndexerTool which includes Spark indexer requires specific input file format specifications
-
If the
--input-file-format
option is specified with CrunchIndexerTool, then its argument must betext
,avro
, oravroParquet
, rather than a fully qualified class name.
- CDH-19923: The
quickstart.sh
file does not validate ZooKeeper and the NameNode on some operating systems. - The
quickstart.sh
file uses thetimeout
function to determine if ZooKeeper and the NameNode are available. To ensure this check can be complete as intended, thequickstart.sh
determines if the operating system on which the script is running supportstimeout
. If the script detects that the operating system does not supporttimeout
, the script continues without checking if the NameNode and ZooKeeper are available. If your environment is configured properly or you are using an operating system that supportstimeout
, this issue does not apply.
- CDH-26856: Field value class guessing and Automatic schema field addition are not supported with the MapReduceIndexerTool nor with the HBaseMapReduceIndexerTool.
- The MapReduceIndexerTool and the HBaseMapReduceIndexerTool can be used with a Managed Schema created via NRT indexing of documents or via the Solr Schema API. However, neither tool supports adding fields automatically to the schema during ingest.
- CDH-19407: The Browse and Spell Request Handlers are not enabled in schemaless mode
- The Browse and Spell Request Handlers require certain fields to be present in the schema. Since those fields cannot be guaranteed to exist in a Schemaless setup, the Browse and Spell Request Handlers are not enabled by default.
- CDH-17978: Enabling blockcache writing may result in unusable indexes.
- It is possible to create indexes with
solr.hdfs.blockcache.write.enabled
set totrue
. Such indexes may appear corrupt to readers, and reading these indexes may irrecoverably corrupt indexes. Blockcache writing is disabled by default.
- CDH-58276: Users with insufficient Solr permissions may receive a "Page Loading" message from the Solr Web Admin UI.
- Users who are not authorized to use the Solr Admin UI are not given a page explaining that access is denied to them, instead receive a web page that never finishes loading.
- CDH-15441: Using MapReduceIndexerTool or HBaseMapReduceIndexerTool multiple times may produce duplicate entries in a collection.
- Repeatedly running the MapReduceIndexerTool on the same set of input files can result in duplicate entries in the Solr collection. This occurs because the tool can only insert documents and cannot update or delete existing Solr documents. This issue does not apply to the HBaseMapReduceIndexerTool unless it is run with more than zero reducers.
- CDH-58694: Deleting collections might fail if hosts are unavailable.
- It is possible to delete a collection when hosts that host some of the collection are unavailable. After such a deletion, if the previously unavailable hosts are brought back online, the deleted collection may be restored.
Unsupported Features
- Package Management System
- HTTP/2
- Solr SQL/JDBC
- Graph Traversal
- Cross Data Center Replication (CDCR)
- SolrCloud Autoscaling
- HDFS Federation
- Saving search results
- Solr contrib modules (Spark, MapReduce and Lily HBase indexers are not contrib modules but part of the Cloudera Search product itself, therefore they are supported).
Limitations
- Default Solr core names cannot be changed
- Although it is technically possible to give user-defined Solr core names during core creation, it is to be avoided in te context of Cloudera Search. Cloudera Manager expects core names in the default "collection_shardX_replicaY" format. Altering core names results in Cloudera Manager being unable to fetch Solr metrics for the given core and this, eventually, may corrupt data collection for co-located core, or even shard and server level charts.