Cloudera Search Known Issues

Previously deleted empty shards may reappear after restarting the leader node

It is possible to be in the process of deleting a collection when nodes are shut down. In such a case, when nodes are restarted, some shards from the deleted collection may still exist, but be empty.

Severity: Low

Workaround: To delete these empty shards, manually delete the folder matching the shard. On the nodes on which the shards exist, remove folders under /var/lib/solr/ that match the collection and shard. For example, if you had an empty shard 1 and empty shard 2 in a collection called MyCollection, you might delete all folders matching /var/lib/solr/MyCollection{1,2}_replica*/.

The `quickstart.sh` file does not validate ZooKeeper and the NameNode on some operating systems

The quickstart.sh file uses the timeout function to determine if ZooKeeper and the NameNode are available. To ensure this check can be complete as intended, the quickstart.sh determines if the operating system on which the script is running supports timeout. If the script detects that the operating system does not support timeout, the script continues without checking if the NameNode and ZooKeeper are available. If your environment is configured properly or you are using an operating system that supports timeout, this issue does not apply.

Severity: Low

Workaround: This issue only occurs in some operating systems. If timeout is not available, a warning if displayed, but the quickstart continues and final validation is always done by the MapReduce jobs and Solr commands that are run by the quickstart.

— Field value class guessing and Automatic schema field addition are not supported with the the MapReduceIndexerTool nor the HBaseMapReduceIndexerTool

The MapReduceIndexerTool and the HBaseMapReduceIndexerTool can be used with a Managed Schema created via NRT indexing of documents or via the Solr Schema API. However, neither tool supports adding fields automatically to the schema during ingest.

Severity: Medium

Workaround: Define the schema before running the MapReduceIndexerTool or HBaseMapReduceIndexerTool. In non-schemaless mode, define in the schema using the schema.xml file. In schemaless mode, either define the schema using the Solr Schema API or index sample documents using NRT indexing before invoking the tools. In either case, Cloudera recommends that you verify that the schema is what you expect using the List Fields API command.

— The “Browse” and “Spell” Request Handlers are not enabled in schemaless mode

The “Browse” and “Spell” Request Handlers require certain fields be present in the schema. Since those fields cannot be guaranteed to exist in a Schemaless setup, the “Browse” and “Spell” Request Handlers are not enabled by default.

Severity: Low

Workaround: If you require the “Browse” and “Spell” Request Handlers, add them to the solrconfig.xml configuration file. Generate a non-schemaless configuration to see the usual settings and modify the required fields to fit your schema.

— Using Solr with Sentry may consume more memory than required

The sentry-enabled solrconfig.xml.secure configuration file does not enable the hdfs global block cache. This does not cause correctness issues, but it can greatly increase the amount of memory that solr requires.

Severity: Medium

Workaround: Enable the hdfs global block cache, by adding the following line to solrconfig.xml.secure under the directoryFactory element:

<str name="solr.hdfs.blockcache.global">${solr.hdfs.blockcache.global: true}</str>

— Enabling blockcache writing may result in unusable indexes

It is possible to create indexes with solr.hdfs.blockcache.write.enabled set to true. Such indexes may appear corrupt to readers, and reading these indexes may irrecoverably corrupt indexes. Blockcache writing is disabled by default.

Severity: Medium

Workaround: Do not enable blockcache writing.

— Solr fails to start when Trusted Realms are added for Solr into Cloudera Manager

Cloudera Manager generates name rules with spaces as a result of entries in the Trusted Realms, which do not work with Solr. This causes Solr to not start.

Severity: Medium

Workaround: Do not use the Trusted Realm field for Solr in Cloudera Manager. To write your own name rule mapping, add an environment variable SOLR_AUTHENTICATION_KERBEROS_NAME_RULES with the mapping. See the Cloudera Manager Security Guide for more information.

— Lily HBase batch indexer jobs fail to launch

A symptom of this issue is an exception similar to the following:

Exception in thread "main" java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
	at java.lang. ClassLoader.defineClass1(Native Method)
	at java.lang. ClassLoader.defineClass( ClassLoader.java:792)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang. ClassLoader.loadClass( ClassLoader.java:424)
	at java.lang. ClassLoader.loadClass( ClassLoader.java:357)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818)
	at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433)
	at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186)
	at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147)
	at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270)
	at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100)
	at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124)
	at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

This is because of an optimization introduced in HBASE-9867 that inadvertently introduced a classloader dependency. In order to satisfy the new classloader requirements, hbase-protocol.jar must be included in Hadoop's classpath. This can be resolved on a per-job launch basis by including it in the HADOOP_CLASSPATH environment variable when you submit the job.

Severity: High

Workaround: Run the following command before issuing Lily HBase MapReduce jobs. Replace the .jar file names and filepaths as appropriate.

$ export HADOOP_CLASSPATH=</path/to/hbase-protocol>.jar; hadoop jar <MyJob>.jar <MyJobMainClass>

— Users may receive limited error messages on requests in Sentry-protected environment.

Users submit requests which are received by a node. The node that receives the request may be different from the node with the relevant information. In such a case, Solr forwards the request to the appropriate node. Once the correct node receives the request, Sentry may deny access.

Because the request was forwarded, available information may be limited. In such a case, the user's client display the error message Server returned HTTP response code: 401 for URL: followed by the Solr machine reporting the error.

Severity: Low

Workaround: For complete error information, review the contents of the Solr logs on the machine reporting the error.

— Users with insufficient Solr permissions may receive a "Page Loading" message from the Solr Web Admin UI

Users who are not authorized to use the Solr Admin UI are not given page explaining that access is denied, and instead receive a web page that never finishes loading.

Severity: Low

Workaround: None

— Mapper-only HBase batch indexer fails if configured to use security.

Attempts to complete an HBase batch indexing job fail when Kerberos authentication is enabled and reducers is set to 0.

Workaround: Either disable Kerberos authentication or use one or more reducers.

— Using MapReduceIndexerTool or HBaseMapReduceIndexerTool multiple times may produce duplicate entries in a collection.

Repeatedly running the MapReduceIndexerTool on the same set of input files can result in duplicate entries in the Solr collection. This occurs because the tool can only insert documents and cannot update or delete existing Solr documents.

Severity: Medium

Workaround: To avoid this issue, use HBaseMapReduceIndexerTool with zero reducers. This must be done without Kerberos.

— Deleting collections may fail if nodes are unavailable.

It is possible to delete a collection when nodes that host some of the collection are unavailable. After such a deletion, if the previously unavailable nodes are brought back online, the deleted collection may be restored.

Severity: Low

Workaround: Ensure all nodes are online before deleting collections.

— Lily HBase Indexer is slow to index new data after restart.

After restarting the Lily HBase Indexer, you can add data to one of the HBase tables. There may be a delay of a few minutes before this newly added data appears in Solr. This delay only occurs with a first HBase addition after a restart. Similar subsequent additions are not subject to this delay.

Severity: Low

Workaround: None

— Some configurations for Lily HBase Indexers cannot be modified after initial creation.

Newly created Lily HBase Indexers define their configuration using the properties in /etc/hbase-solr/conf/hbase-indexer-site.xml. Therefore, if the properties in the hbase-indexer-site.xml file are incorrectly defined, new indexers do not work properly. Even after correcting the contents of hbase-indexer-site.xml and restarting the indexer service, old, incorrect content persists. This continues to create non-functioning indexers.

Severity: Medium

Workaround:

Warning: This workaround involves completing destructive operations that delete all of your other Lily HBase Indexers.

To resolve this issue:

Connect to each machine running the Lily HBase Indexer service using the NGdata and stop the indexer:
```
service hbase-solr-indexer stop
```
Note: You may need to stop the service on multiple machines.
For each indexer machine, modify the /etc/hbase-solr/conf/hbase-indexer-site.xml file to include valid settings.
Connect to the ZooKeeper machine, invoke the ZooKeeper CLI, and remove all contents of the /ngdata chroot:
```
$ /usr/lib/zookeeper/bin/zkCli.sh
[zk: localhost:2181( CONNECTED) 0] rmr /ngdata
```
Connect to each indexer machine and restart the indexer service.
```
service hbase-solr-indexer start
```

After restarting the client services, ZooKeeper is updated with the correct information stored on the updated clients.

— Saving search results is not supported in this release.

This version of Cloudera Search does not support the ability to save search results.

Severity: Low

Workaround: None

— HDFS Federation is not supported in this release.

This version of Cloudera Search does not support HDFS Federation.

Severity: Low

Workaround: None

— Block Cache Metrics are not supported in this release.

This version of Cloudera Search does not support block cache metrics.

Severity: Low

Workaround: None

— Shard splitting support is experimental.

Shard splitting was added with the recent release of Solr 4.4. Cloudera anticipates shard splitting to function as expected with Cloudera Search, but this interaction has not been thoroughly tested. Therefore, Cloudera cannot guarantee issues will not arise when shard splitting is used with Search.

Severity: Low

Workaround: Use shard splitting for test and development purposes, but be aware of the risks of using shard splitting in production environments. To avoid using shard splitting, use the source data to create a new index with a new sharding count by re-indexing the data to a new collection. You can enable this using the MapReduceIndexerTool.

— User with `update` access to the administrative collection can elevate the access.

Users are granted access to collections. Access to several collections can be simplified by aliasing a set of collections. Creating an alias requires update access to the administrative collection. Any user with update access to the administrative collection is granted query access to all collections in the resulting alias. This is true even if the user with update access to the administrative collection otherwise would be unable to query the other collections that have been aliased.

Severity: Medium

Workaround: None. Mitigate the risk by limiting the users with update access to the administrative collection.

Cloudera Search Known Issues

Previously deleted empty shards may reappear after restarting the leader node

The quickstart.sh file does not validate ZooKeeper and the NameNode on some operating systems

— Field value class guessing and Automatic schema field addition are not supported with the the MapReduceIndexerTool nor the HBaseMapReduceIndexerTool

— The “Browse” and “Spell” Request Handlers are not enabled in schemaless mode

— Using Solr with Sentry may consume more memory than required

— Enabling blockcache writing may result in unusable indexes

— Solr fails to start when Trusted Realms are added for Solr into Cloudera Manager

— Lily HBase batch indexer jobs fail to launch

— Users may receive limited error messages on requests in Sentry-protected environment.

— Users with insufficient Solr permissions may receive a "Page Loading" message from the Solr Web Admin UI

— Mapper-only HBase batch indexer fails if configured to use security.

— Using MapReduceIndexerTool or HBaseMapReduceIndexerTool multiple times may produce duplicate entries in a collection.

— Deleting collections may fail if nodes are unavailable.

— Lily HBase Indexer is slow to index new data after restart.

— Some configurations for Lily HBase Indexers cannot be modified after initial creation.

— Saving search results is not supported in this release.

— HDFS Federation is not supported in this release.

— Block Cache Metrics are not supported in this release.

— Shard splitting support is experimental.

— User with update access to the administrative collection can elevate the access.

The `quickstart.sh` file does not validate ZooKeeper and the NameNode on some operating systems

— User with `update` access to the administrative collection can elevate the access.