Deploying Cloudera Search
When you deploy Cloudera Search, SolrCloud partitions your data set into multiple indexes and processes, using ZooKeeper to simplify management, resulting in a cluster of coordinating Solr servers.
Installing and Starting ZooKeeper Server
SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management. For a small cluster, running a ZooKeeper host collocated with the NameNode is recommended. For larger clusters, you may want to run multiple ZooKeeper servers. For more information, see Installing the ZooKeeper Packages.
Initializing Solr
Once the ZooKeeper Service is running, configure each Solr host with the ZooKeeper Quorum address or addresses. Provide the ZooKeeper Quorum address for each ZooKeeper server. This could be a single address in smaller deployments, or multiple addresses if you deploy additional servers.
Configure the ZooKeeper Quorum address in solr-env.sh. The file location varies by installation type. If you accepted default file locations, the solr-env.sh file can be found in:
- Parcels: /opt/cloudera/parcels/CDH-*/etc/default/solr
- Packages: /etc/default/solr
Edit the property to configure the hosts with the address of the ZooKeeper service. You must make this configuration change for every Solr Server host. The following example shows a configuration with three ZooKeeper hosts:
SOLR_ZK_ENSEMBLE=<zkhost1>:2181,<zkhost2>:2181,<zkhost3>:2181/solr
Configuring Solr for Use with HDFS
To use Solr with your established HDFS service, perform the following configurations:
- Configure the HDFS URI for Solr to use as a backing store in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, edit the following property to configure the location of Solr index data in HDFS:
SOLR_HDFS_HOME=hdfs://namenodehost:8020/solr
Replace namenodehost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file). You may also need to change the port number from the default (8020). On an HA-enabled cluster, ensure that the HDFS URI you use reflects the designated name service used by your cluster. This value should be reflected in fs.default.name; instead of a hostname, you would see hdfs://nameservice1 or something similar.
- In some cases, such as for configuring Solr to work with HDFS High Availability (HA), you may want to configure the Solr HDFS client by setting
the HDFS configuration directory in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, locate
the appropriate HDFS configuration directory and edit the following property with the absolute path to this directory :
SOLR_HDFS_CONFIG=/etc/hadoop/conf
Replace the path with the correct directory containing the proper HDFS configuration files, core-site.xml and hdfs-site.xml.
Configuring Solr to Use Secure HDFS
- For information on setting up a secure CDH cluster for CDH 4, see the CDH 4 Security Guide.
- For information on setting up a secure CDH cluster for CDH 5, see the CDH 5 Security Guide.
- Create the Kerberos principals and Keytab files for every host in your cluster:
- Create the Solr principal using either kadmin or kadmin.local.
kadmin: addprinc -randkey solr/fully.qualified.domain.name@YOUR-REALM.COM
kadmin: xst -norandkey -k solr.keytab solr/fully.qualified.domain.name
For more information, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files
- Create the Solr principal using either kadmin or kadmin.local.
- Deploy the Kerberos Keytab files on every host in your cluster:
- Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
$ sudo mv solr.keytab /etc/solr/conf/
$ sudo chown solr:hadoop /etc/solr/conf/solr.keytab $ sudo chmod 400 /etc/solr/conf/solr.keytab
- Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
- Add Kerberos-related settings to /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr on every host in your cluster, substituting appropriate values. For a package based installation, use something similar to the
following:
SOLR_KERBEROS_ENABLED=true SOLR_KERBEROS_KEYTAB=/etc/solr/conf/solr.keytab SOLR_KERBEROS_PRINCIPAL=solr/fully.qualified.domain.name@YOUR-REALM.COM
Creating the /solr Directory in HDFS
Before starting the Cloudera Search server, you need to create the /solr directory in HDFS. The Cloudera Search master runs as solr:solr, so it does not have the required permissions to create a top-level directory.
$ sudo -u hdfs hadoop fs -mkdir /solr $ sudo -u hdfs hadoop fs -chown solr /solr
Initializing the ZooKeeper Namespace
$ solrctl init
Starting Solr
$ sudo service solr-server restart
$ sudo jps -lm 31407 sun.tools.jps.Jps -lm 31236 org.apache.catalina.startup.Bootstrap start
Runtime Solr Configuration
To start using Solr for indexing the data, you must configure a collection holding the index. A configuration for a collection requires a solrconfig.xml file, a schema.xml and any helper files referenced from the xml files. The solrconfig.xml file contains all of the Solr settings for a given collection, and the schema.xml file specifies the schema that Solr uses when indexing documents. For more details on how to configure a collection for your data set, see http://wiki.apache.org/solr/SchemaXml.
$ solrctl instancedir --generate $HOME/solr_configs
You can customize it by directly editing the solrconfig.xml and schema.xml files created in $HOME/solr_configs/conf.
These configuration files are compatible with the standard Solr tutorial example documents.
$ solrctl instancedir --create collection1 $HOME/solr_configs
$ solrctl instancedir --list
If you used the earlier --create command to create collection1, the --list command should return collection1.
Creating Your First Solr Collection
$ solrctl collection --create collection1 -s {{numOfShards}}
You should be able to check that the collection is active. For example, for the server myhost.example.com, you should be able to browse to http://myhost.example.com:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true and verify that the collection is active. Similarly, you should be able to view the topology of your SolrCloud using a URL similar to http://myhost.example.com:8983/solr/#/~cloud.
For more information on completing additional collection management tasks, see Managing Solr Using solrctl.