This is the documentation for Cloudera Manager 5.0.x. Documentation for other versions is available at Cloudera Documentation.

The Spark Service

You can install Spark through the Cloudera Manager installation wizard using parcels and have the service created and started as part of the first run installation wizard. See Installing Spark. If you elect not to include the Spark service using the installation wizard, you can use the Add Service wizard to create the service. The wizard will automatically configure and start dependent services and the Spark service. See Adding a Service for instructions.
  Note: Limitations:
  • The Spark service does not work with secure HDFS
  • This release of Cloudera Manager does not generate a client configuration to run Spark on YARN

Testing the Spark Service

To test the Spark service, start the Spark shell, spark-shell, on one of the hosts. You can see the Spark shell application, and its executors, and logs in the Spark Master UI, by default at http://spark-master:18080.

Within the Spark shell, you can run a word count application. For example:
val file = sc.textFile("hdfs://namenode:8020/path/to/input")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://namenode:8020/output")

Spark on YARN

In secure clusters, Spark applications can only run on YARN. To run Spark on YARN, stop the Spark service and submit Spark applications to YARN directly. For an example, see Running SparkPi in YARN.
Page generated September 3, 2015.