This is the documentation for Cloudera Manager 5.1.x. Documentation for other versions is available at Cloudera Documentation.

The Spark Service

You can install Spark through the Cloudera Manager installation wizard using parcels and have the service created and started as part of the first run installation wizard. See Installing Spark. If you elect not to include the Spark service using the installation wizard, you can use the Add Service wizard to create the service. The wizard will automatically configure and start dependent services and the Spark service. See Adding a Service for instructions.
  Note: Limitations:
  • The Spark service does not work with secure HDFS
  • This release of Cloudera Manager does not generate a client configuration to run Spark on YARN

Adding the Spark History Server Role

  1. Go to the Spark service.
  2. Click the Instances tab.
  3. Click the Add Role Instances button.
  4. Select a host in the column under History Server, then click OK.
  5. Click Continue.
  6. Check the checkbox next to the History Server role.
  7. Select Actions for Selected > Start and click Start.
  8. Click Close when the action completes.

Testing the Spark Service

To test the Spark service, start the Spark shell, spark-shell, on one of the hosts. You can see the Spark shell application, and its executors, and logs in the Spark Master UI, by default at http://spark-master:18080.

Within the Spark shell, you can run a word count application. For example:
val file = sc.textFile("hdfs://namenode:8020/path/to/input")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://namenode:8020/output")

Spark on YARN

In secure clusters, Spark applications can only run on YARN. To run Spark on YARN, stop the Spark Master and Worker roles and submit Spark applications to YARN directly. For an example, see Running SparkPi in YARN.
Page generated September 3, 2015.