Deploying Cloudera Search
When you deploy Cloudera Search, SolrCloud partitions your data set into multiple indexes and processes, and uses ZooKeeper to simplify management, which results in a cluster of coordinating Apache Solr servers.
Installing and Starting ZooKeeper Server
SolrCloud mode uses Apache ZooKeeper as a highly available, central location for cluster management. For a small cluster, running ZooKeeper collocated with the NameNode is recommended. For larger clusters, use multiple ZooKeeper servers.
If you do not already have a ZooKeeper service added to your cluster, add it using the instructions in Adding a Service for Cloudera Manager installations.
Initializing Solr
For Cloudera Manager installations, if you have not yet added the Solr service to your cluster, do so now using the instructions in Adding a Service. The Add a Service wizard automatically configures and initializes the Solr service.
Generating Collection Configuration
To start using Solr and indexing data, you must configure a collection to hold the index. A collection requires the following configuration files:
- solrconfig.xml
- schema.xml
- Any additional files referenced in the xml files
The solrconfig.xml file contains all of the Solr settings for a given collection, and the schema.xml file specifies the schema that Solr uses when indexing documents. For more details on how to configure a collection, see http://wiki.apache.org/solr/SchemaXml.
$ solrctl instancedir --generate $HOME/solr_configs
You can customize a collection by directly editing the solrconfig.xml and schema.xml files created in $HOME/solr_configs/conf.
$ solrctl instancedir --create <collection_name> $HOME/solr_configs
$ solrctl instancedir --list
For example, if you used the --create command to create a collection named weblogs, the --list command should return weblogs.
Creating Collections
$ solrctl collection --create <collection_name> -s <shard_count>To use the configuration that you provided to Solr in previous steps, use the same collection name (weblogs in our example). The -s <shard_count> parameter specifies the number of SolrCloud shards you want to partition the collection across. The number of shards cannot exceed the total number of Solr servers in your SolrCloud cluster.
To verify that the collection is active, go to http://search01.example.com:8983/solr/<collection_name>/select?q=*%3A*&wt=json&indent=true in a browser. For example, for the collection weblogs, the URL is http://search01.example.com:8983/solr/<collection_name>/select?q=*%3A*&wt=json&indent=true. Replace search01.example.com with the hostname of one of the Solr server hosts.
You can also view the SolrCloud topology using the URL http://search01.example.com:8983/solr/#/~cloud.
For more information on completing additional collection management tasks, see Managing Cloudera Search.