Running Spark Applications on Spark Standalone

In CDH 5, Cloudera recommends running Spark applications on a YARN cluster manager instead of on a Spark Standalone cluster manager, for the following benefits:
  • You can dynamically share and centrally configure the same pool of cluster resources among all frameworks that run on YARN.
  • You can use all the features of YARN schedulers for categorizing, isolating, and prioritizing workloads.
  • You choose the number of executors to use; in contrast, Spark Standalone requires each application to run an executor on every host in the cluster.
  • Spark can run against Kerberos enabled Hadoop clusters and use secure authentication between its processes.

Running a Spark Shell Application on Spark Standalone

To run the spark-shell or pyspark application on Spark Standalone, use the --master http://spark_master:spark_master_port flag when you start the application.

Submitting Spark Applications to Spark Standalone

To submit a Spark application on Spark Standalone, supply the --master and --deploy-mode client arguments to spark-submit.

Example: Running SparkPi on Spark Standalone

  • CDH 5.2 and lower:
    spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client \
    --master http://spark_master:spark_master_port \
    SPARK_HOME/examples/lib/spark-examples.jar 10
  • CDH 5.3 and higher:
    spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client \
    --master http://spark_master:spark_master_port \
    SPARK_HOME/lib/spark-examples.jar 10
The argument passed after the JAR controls how close to Pi the approximation should be.