The Spark Service
You can install Spark through the Cloudera Manager installation wizard using parcels and have the service created and started as part of the first run installation wizard. See Installing Spark. If you elect not to include the Spark service using the installation wizard, you can use the Add Service wizard to create the service. The wizard will automatically configure and start dependent services and the Spark service. See Adding a Service for instructions.
Note: Limitations:
- The Spark service does not work with secure HDFS
- This release of Cloudera Manager does not generate a client configuration to run Spark on YARN
Testing the Spark Service
To test the Spark service, start the Spark shell, spark-shell, on one of the hosts. You can see the Spark shell application, and its executors, and logs in the Spark Master UI, by default at http://spark-master:18080.
Within the Spark shell, you can run a word count application. For example:
val file = sc.textFile("hdfs://namenode:8020/path/to/input") val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) counts.saveAsTextFile("hdfs://namenode:8020/output")
Spark on YARN
In secure clusters, Spark applications can only run on YARN. To run Spark on YARN, stop the Spark service and submit Spark applications to YARN directly. For an example, see Running SparkPi in YARN.<< The Solr Service | The Sqoop 1 Client >> | |