This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Search Architecture

Search runs as a distributed service on a set of servers, and each server is responsible for some portion of the entire set of content to be searched. The entire set of information to be searched is split into smaller pieces, copies are made of these pieces, and the pieces are distributed among the servers. This provides two main advantages:

  • Dividing the content into smaller pieces distributes the task of indexing the content among the servers.
  • Duplicating the pieces of the whole allows queries to be scaled more effectively and the makes it possible for the system to provide higher levels of availability.

Each Search server can handle requests for information. This means that a client can send requests to index documents or carry out searches to any arbitrary Search server and the server routes the request to the correct Search server.

Page generated September 3, 2015.