Installing the Spark Indexer

The Spark indexer uses a Spark or MapReduce ETL batch job to move data from HDFS files into Apache Solr. As part of this process, the indexer uses Morphlines to extract and transform data.

To use the Spark indexer, solr-crunch must be installed on hosts where you want to submit a batch indexing job.

By default, this tool is included with Cloudera Search when you have installed CDH using parcels in a Cloudera Manager deployment. If you are using a package-based installation and this tool does not exist on your system, you can install it using the commands described in this topic.

To install solr-crunch On RHEL systems:

$ sudo yum install solr-crunch

To install solr-crunch on Ubuntu and Debian systems:

$ sudo apt-get install solr-crunch

To install solr-crunch on SLES systems:

$ sudo zypper install solr-crunch

For information on using Spark to batch index documents, see the Spark Indexing.