Installing the Spark Indexer

The Spark indexer uses a Spark or MapReduce ETL batch job to move data from HDFS files into Apache Solr. As part of this process, the indexer uses Morphlines to extract and transform data.

To use the Spark indexer, solr-crunch must be installed on hosts where you want to submit a batch indexing job.

By default, this tool is installed when Cloudera Search is installed using parcels, such as in a Cloudera Manager deployment. If you are using a package installation and this tool does not exist on your system, you can install this tool using the commands described in this topic.

To install solr-crunch On RHEL systems:

$ sudo yum install solr-crunch

To install solr-crunch on Ubuntu and Debian systems:

$ sudo apt-get install solr-crunch

To install solr-crunch on SLES systems:

$ sudo zypper install solr-crunch

For information on using Spark to batch index documents, see the Spark Indexing.