Managing Spark Using Cloudera Manager

The Spark service is available in two versions: Spark and Spark (Standalone). The previously available Spark service, which runs Spark in standalone mode, has been renamed Spark (Standalone). The Spark (Standalone) service has its own runtime roles: Master and Worker. The current Spark service runs Spark as a YARN application. Both services have a History Server role. In secure clusters, Spark applications can only run on YARN. Cloudera recommends that you use the Spark service.

You can install Spark through the Cloudera Manager Installation wizard using parcels and have the Spark service added and started as part of the Installation wizard. See Installing Spark.

If you elect not to add the Spark service using the Installation wizard, you can use the Add Service wizard to create the service. The wizard automatically configures dependent services and the Spark service. See Adding a Service for instructions.

When you upgrade from Cloudera Manager 5.1 or lower to Cloudera 5.2 or higher, Cloudera Manager does not migrate an existing Spark service, which runs Spark in standalone mode, to a Spark on YARN service.

How Spark Configurations are Propagated to Spark Clients

Because the Spark service does not have worker roles, another mechanism is needed to enable the propagation of client configurations to the other hosts in your cluster. In Cloudera Manager gateway roles fulfill this function. Whether you add a Spark service at installation time or at a later time, ensure that you assign the gateway roles to hosts in the cluster. If you do not have gateway roles, client configurations are not deployed.

Testing the Spark Service

To test the Spark service, start the Spark shell, spark-shell, on one of the hosts. Within the Spark shell, you can run a word count application. For example:
val file = sc.textFile("hdfs://namenode:8020/path/to/input")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://namenode:8020/output")

To submit Spark applications to YARN, use the --master yarn flag when you start spark-shell. To see information about the running Spark shell application, go to Spark History Server UI at http://spark_history_server:18088 or the YARN applications page in the Cloudera Manager Admin Console.

If you are running the Spark (Standalone) service, you can see the Spark shell application, and its executors, and logs in the Spark Master UI, by default at http://spark_master:18080.

For more information on running Spark applications, see Running Spark Applications.

Adding the Spark History Server Role

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

By default, the Spark (Standalone) service is not created with a History Server. To add the History Server:
  1. Go to the Spark service.
  2. Click the Instances tab.
  3. Click the Add Role Instances button.
  4. Select a host in the column under History Server, then click OK.
  5. Click Continue.
  6. Check the checkbox next to the History Server role.
  7. Select Actions for Selected > Start and click Start.
  8. Click Close when the action completes.