This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Cloudera Search Glossary


An operation that forces documents to be made searchable.
  • hard - A commit that starts the autowarm process, closes old searchers and opens new ones. It may also trigger replication.
  • soft - New functionality with NRT and SolrCloud that makes documents searchable without requiring the work of hard commits.

embedded Solr

The ability to execute Solr commands without having a separate servlet container. Generally, use of embedded Solr is discouraged because it is often used due to the mistaken belief that HTTP is inherently too expensive to go fast. With Cloudera Search, and especially if the idea of some kind of MapReduce process is adopted, embedded Solr is probably advisable.


“Counting buckets” for a query. For example, suppose the search is for the term “shoes”. You might want to return a result that there were various different quantities, such as "X brown, Y red and Z blue shoes" that matched the rest of the query.

filter query (fq)

A clause that limits returned results. For instance, “fq=sex:male” limits results to males. Filter queries are cached and reused.

Near Real Time (NRT)

The ability to search documents very soon after they are added to Solr. With SolrCloud, this is largely automatic and measured in seconds.


In SolrCloud, a complete copy of a shard. Each replica is identical, so only one replica has to be queried (per shard) for searches.


Splitting a single logical index up into some number of sub-indexes, each of which can be hosted on a separate machine. Solr (and especially SolrCloud) handles querying each shard and assembling the response into a single, coherent list.


ZooKeeper-enabled, fault-tolerant, distributed Solr. This is new in Solr 4.0.


A Java API for interacting with a Solr instance.

Page generated September 3, 2015.