Validating the Cloudera Search Deployment
After installing and deploying Cloudera Search, you can validate the deployment by indexing and querying sample documents. You can think of this as a type of "Hello, World!" for Cloudera Search to make sure that everything is installed and working properly.
Before beginning this process, make sure you have access to the Apache Solr admin web console. If your cluster is Kerberos-enabled, make sure you have access to the solr@EXAMPLE.COM Kerberos principal (where EXAMPLE.COM is your Kerberos realm name).
Configuring Sentry for Test Collection
If you have enabled Apache Sentry for authorization, you must have UPDATE permission for the admin=collections object as well as the collection you are creating (test_collection in this example). You can also use the wildcard (*) to grant permissions to create any collection.
For more information on configuring Sentry and granting permissions, see Configuring Sentry Authorization for Cloudera Search.
To grant your user account (jdoe in this example) the necessary permissions:
- Switch to the Sentry admin user (solr in this example) using kinit:
kinit solr@EXAMPLE.COM
- Create a Sentry role for your user account:
solrctl sentry --create-role cloudera_tutorial_role
- Map a group to this role. In this example, user jdoe is a member of the eng group:
solrctl sentry --add-role-group cloudera_tutorial_role eng
- Grant UPDATE privileges to the cloudera_tutorial_role role for the admin=collections object
and test_collection collection:
solrctl sentry --grant-privilege cloudera_tutorial_role 'admin=collections->action=UPDATE' solrctl sentry --grant-privilege cloudera_tutorial_role 'collection=test_collection->action=UPDATE'
You also need to grant UPDATE privileges on the Config object to be able to upload instance configurations on which you will base your collection:For more information on the Sentry privilege model for Cloudera Search, see Authorization Privilege Model for Cloudera Search.solrctl sentry --grant-privilege cloudera_tutorial_role 'config=test_collection_config->action=UPDATE'
Creating a Test Collection
- If you are using Kerberos, kinit as the user that has privileges to create the collection:
kinit jdoe@EXAMPLE.COM
Replace EXAMPLE.COM with your Kerberos realm name.
- Make sure that the SOLR_ZK_ENSEMBLE environment variable is set in /etc/solr/conf/solr-env.sh. For example:
cat /etc/solr/conf/solr-env.sh
export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr
If you are using Cloudera Manager, this is automatically set on hosts with a Solr Server or Gateway role.
- Generate configuration files for the collection:
solrctl instancedir --generate $HOME/test_collection_config
- If you are using Sentry for authorization, overwrite solrconfig.xml with solrconfig.xml.secure. If you omit this step,
Sentry authorization is not enabled for the collection:
cp $HOME/test_collection_config/conf/solrconfig.xml.secure $HOME/test_collection_config/conf/solrconfig.xml
This example uses collection level security only, so you need to disable document level security which is enabled by default in the generated instance configuration files. To do this, edit the newly generated solrconfig.xml:vi $HOME/test_collection_config/conf/solrconfig.xml
Locate the section:<searchComponent name="queryDocAuthorization" class="org.apache.solr.handler.component.QueryDocAuthorizationComponent" >
and change<bool name="enabled">true</bool>
to<bool name="enabled">false</bool>
For more information, see Providing Document-Level Security Using Sentry and Solr Query Returns no Documents when Executed with a Non-Privileged User. - Use the ConfigSets API to upload the configuration to ZooKeeper:
(cd $HOME/test_collection_config/conf && zip -r - *) | curl -k --negotiate -u : -X POST --header "Content-Type:application/octet-stream" --data-binary @- "https://search01.example.com:8985/solr/admin/configs?action=UPLOAD&name=test_collection_config"
- Create a new collection with two shards (specified by the -s parameter) using the
named configuration (specified by the -c parameter):
solrctl collection --create test_collection -s 2 -c test_collection_config
Indexing Sample Data
- Parcel-based Installation (Security Enabled):
cd /opt/cloudera/parcels/CDH/share/doc/solr-doc*/example/exampledocs find *.xml -exec curl -i -k --negotiate -u: https://search01.example.com:8985/solr/test_collection/update -H "Content-Type: text/xml" --data-binary @{} \;
- Parcel-based Installation (Security Disabled):
cd /opt/cloudera/parcels/CDH/share/doc/solr-doc*/example/exampledocs java -Durl=http://search01.example.com:8983/solr/test_collection/update -jar post.jar *.xml
- Package-based Installation (Security Enabled):
cd /usr/share/doc/solr-doc*/example/exampledocs find *.xml -exec curl -i -k --negotiate -u: https://search01.example.com:8985/solr/test_collection/update -H "Content-Type: text/xml" --data-binary @{} \;
- Package-based Installation (Security Disabled):
cd /usr/share/doc/solr-doc*/example/exampledocs java -Durl=http://search01.example.com:8983/solr/test_collection/update -jar post.jar *.xml
Querying Sample Data
Run a query to verify that the sample data is successfully indexed and that you are able to search it:
- Open the Solr admin web interface in a browser by accessing the following URL:
- Security Enabled: https://search01.example.com:8985/solr
- Security Disabled: http://search01.example.com:8983/solr
- Select Cloud from the left panel.
- Select one of the hosts listed for the test_collection collection.
- From the Core Selector drop-down menu in the left panel, select the test_collection shard.
- Select Query from the left panel and click Execute Query. If you see results
such as the following, indexing was successful:
"response": { "numFound": 32, "start": 0, "maxScore": 1, "docs": [ { "id": "SP2514N", "name": "Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133", "manu": "Samsung Electronics Co. Ltd.", "manu_id_s": "samsung", "cat": [ "electronics", "hard drive" ],
Next Steps
After you have verified that Cloudera Search is installed and running properly, you can experiment with other methods of ingesting and indexing data. This tutorial uses tweets to demonstrate batch indexing and near real time (NRT) indexing. Continue on to the next portion of the tutorial:
To learn more about Solr, see the Apache Solr Tutorial.