Deploying Cloudera Search

When you deploy Cloudera Search, SolrCloud partitions your data set into multiple indexes and processes, and uses ZooKeeper to simplify management, which results in a cluster of coordinating Apache Solr servers.

Installing and Starting ZooKeeper Server

SolrCloud mode uses Apache ZooKeeper as a highly available, central location for cluster management. For a small cluster, running ZooKeeper collocated with the NameNode is recommended. For larger clusters, use multiple ZooKeeper servers. For more information, see Installing ZooKeeper in a Production Environment.

If you do not already have a ZooKeeper service added to your cluster, add it using the instructions in Adding a Service for Cloudera Manager installations. For package-based unmanaged clusters, see ZooKeeper Installation.

Initializing Solr

For Cloudera Manager installations, if you have not yet added the Solr service to your cluster, do so now using the instructions in Adding a Service. The Add a Service wizard automatically configures and initializes the Solr service.

Configuring ZooKeeper Quorum Addresses

After the ZooKeeper service is running, configure each Solr host with the ZooKeeper quorum addresses. This can be a single address if you have only one ZooKeeper server, or multiple addresses if you are using multiple servers.

Configure the ZooKeeper Quorum addresses in /etc/solr/conf/solr-env.sh on each Solr server host. For example:

$ cat /etc/solr/conf/solr-env.sh
export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr

Configuring Solr for Use with HDFS

To use Solr with your established HDFS service, perform the following configurations:

  1. Configure the HDFS URI for Solr to use as a backing store in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, edit the following property to configure the location of Solr index data in HDFS:
    SOLR_HDFS_HOME=hdfs://nn01.example.com:8020/solr

    Replace nn01.example.com with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your /etc/hadoop/conf/core-site.xml file). You might also need to change the port number from the default (8020) if your NameNode runs on a non-default port. On an HA-enabled cluster, ensure that the HDFS URI you use reflects the designated name service used by your cluster. This value must be reflected in fs.default.name (for example, hdfs://nameservice1 or something similar).

  2. In some cases, such as configuring Solr to work with HDFS High Availability (HA), you might want to configure the Solr HDFS client by setting the HDFS configuration directory in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, locate the appropriate HDFS configuration directory and edit the following property with the absolute path to this directory :
    SOLR_HDFS_CONFIG=/etc/hadoop/conf

    Replace the path with the correct directory containing the proper HDFS configuration files, core-site.xml and hdfs-site.xml.

Configuring Solr to Use Secure HDFS

If security is enabled, perform the following steps:

  1. Create the Kerberos principals and Keytab files for every host in your cluster:
    1. Create the Solr principal using either kadmin or kadmin.local.
      kadmin:  addprinc -randkey solr/fully.qualified.domain.name@YOUR-REALM.COM
      kadmin:  xst -norandkey -k solr.keytab solr/fully.qualified.domain.name

      For more information, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files

  2. Deploy the Kerberos Keytab files on every host in your cluster:
    1. Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
      $ sudo mv solr.keytab /etc/solr/conf/
      $ sudo chown solr:hadoop /etc/solr/conf/solr.keytab
      $ sudo chmod 400 /etc/solr/conf/solr.keytab
  3. Add Kerberos-related settings to /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr on every host in your cluster, substituting appropriate values. For a package based installation, use something similar to the following:
    SOLR_KERBEROS_ENABLED=true
    SOLR_KERBEROS_KEYTAB=/etc/solr/conf/solr.keytab
    SOLR_KERBEROS_PRINCIPAL=solr/fully.qualified.domain.name@YOUR-REALM.COM

Creating the /solr Directory in HDFS

Before starting the Cloudera Search server, you must create the /solr directory in HDFS. The Cloudera Search service runs as the solr user by default, so it does not have the required permissions to create a top-level directory.

To create the /solr directory in HDFS:
$ sudo -u hdfs hdfs dfs -mkdir /solr
$ sudo -u hdfs hdfs dfs -chown solr /solr

If you are using a Kerberos-enabled cluster, you must authenticate with the hdfs account or another superuser before creating the directory:

$ kinit hdfs@EXAMPLE.COM
$ hdfs dfs -mkdir /solr
$ hdfs dfs -chown solr /solr

Initializing the ZooKeeper Namespace

Before starting the Cloudera Search server, you must create the solr namespace in ZooKeeper:
$ solrctl init

Starting Solr

Start the Solr service on each host:
$ sudo service solr-server restart
After you have started the Cloudera Search Server, the Solr server should be running. To verify that all daemons are running, use the jps tool from the Oracle JDK, which you can obtain from the Java SE Downloads page. If you are running a pseudo-distributed HDFS installation and a Solr search installation on one machine, jps shows the following output:
$ sudo jps -lm
31407 sun.tools.jps.Jps -lm
31236 org.apache.catalina.startup.Bootstrap start

Generating Collection Configuration

To start using Solr and indexing data, you must configure a collection to hold the index. A collection requires the following configuration files:

  • solrconfig.xml
  • schema.xml
  • Any additional files referenced in the xml files

The solrconfig.xml file contains all of the Solr settings for a given collection, and the schema.xml file specifies the schema that Solr uses when indexing documents. For more details on how to configure a collection, see http://wiki.apache.org/solr/SchemaXml.

Configuration files for a collection are contained in a directory called an instance directory. To generate a template instance directory, run the following command:
$ solrctl instancedir --generate $HOME/solr_configs

You can customize a collection by directly editing the solrconfig.xml and schema.xml files created in $HOME/solr_configs/conf.

After you completing the configuration, you can make it available to Solr by running the following command, which uploads the contents of the instance directory to ZooKeeper:
$ solrctl instancedir --create <collection_name> $HOME/solr_configs
Use the solrctl utility to verify that your instance directory uploaded successfully and is available to ZooKeeper. List the uploaded instance directories as follows:
$ solrctl instancedir --list

For example, if you used the --create command to create a collection named weblogs, the --list command should return weblogs.

Creating Collections

The Solr server does not include any default collections. Create a collection using the following command:
$ solrctl collection --create <collection_name> -s <shard_count>
To use the configuration that you provided to Solr in previous steps, use the same collection name (weblogs in our example). The -s <shard_count> parameter specifies the number of SolrCloud shards you want to partition the collection across. The number of shards cannot exceed the total number of Solr servers in your SolrCloud cluster.

To verify that the collection is active, go to http://search01.example.com:8983/solr/<collection_name>/select?q=*%3A*&wt=json&indent=true in a browser. For example, for the collection weblogs, the URL is http://search01.example.com:8983/solr/weblogs/select?q=*%3A*&wt=json&indent=true. Replace search01.example.com with the hostname of one of the Solr server hosts.

You can also view the SolrCloud topology using the URL http://search01.example.com:8983/solr/#/~cloud.

For more information on completing additional collection management tasks, see Managing Cloudera Search.