Cloudera Search Components
Search interacts with existing CDH components, using many of them to solve different problems. The following table lists CDH components that contribute to Search process and the ways in which each component helps:
Component |
Contribution |
Applicable To |
---|---|---|
HDFS |
Source documents are typically stored in HDFS. These documents are indexed and made searchable. The files that support Search such as Lucene index files and write-ahead logs are also stored in HDFS. Using HDFS provides simpler provisioning on a larger base, redundancy, and fault tolerance. As a result of using HDFS, Search servers are essentially stateless, meaning there are minimal consequences from node failures. HDFS also provides additional benefits such as snapshotting, inter-cluster replication, and disaster recovery. |
All cases |
MapReduce |
Search includes a pre-built MapReduce-based job. This job can be used for on-demand or scheduled indexing of any supported data set stored in HDFS. This job utilizes cluster resources for scalable batch indexing. |
Many cases |
Flume |
Cloudera Search includes a Flume sink that enables writing events directly to indexers deployed on the cluster, enabling data indexing during ingestion. |
Many cases |
Hue |
Cloudera Search includes a Hue frontend search application that uses standard Solr APIs is included. The application can interact with data indexed in HDFS. The application provides support for the Solr standard query language, visualization of faceted search functionality, and a typical full text search GUI-based. |
Many cases |
ZooKeeper |
Coordinates distribution of data and metadata, also known as shards. ZooKeeper provides automatic failover, increasing service resiliency. |
Many cases |
HBase |
Supports indexing of stored data, extracting columns, column families, and key information as fields. Because HBase does not use secondary indexing, Search can complete full text searches of content in rows and tables in HBase. |
Some cases |
Cloudera Manager |
Deploys, configures, manages, and monitors the search processes and resource utilization across services on the cluster. Search does not require Cloudera Manager, but Cloudera Manager helps simplify Search administration. |
Some cases |
Oozie |
Automates scheduling and management of indexing jobs. Oozie can check for new data and begin indexing jobs, as required. |
Some cases |
Impala |
Further analyzes search results. |
Some cases |
Hive |
Further analyzes search results. |
Some cases |
Avro |
Includes metadata that Search can use for indexing. |
Some cases |
Sqoop |
Ingests data in batch and enables data availability for batch indexing. |
Some cases |
Mahout |
Applies machine learning processing to search results. |
Some cases |
<< Understanding Cloudera Search | Search Architecture >> | |