Running Spark Applications on Spark Standalone
In CDH 5, Cloudera recommends running Spark applications on a YARN cluster manager instead of on a Spark Standalone cluster
manager, for the following benefits:
- You can dynamically share and centrally configure the same pool of cluster resources among all frameworks that run on YARN.
- You can use all the features of YARN schedulers for categorizing, isolating, and prioritizing workloads.
- You choose the number of executors to use; in contrast, Spark Standalone requires each application to run an executor on every host in the cluster.
- Spark can run against Kerberos enabled Hadoop clusters and use secure authentication between its processes.
Running a Spark Shell Application on Spark Standalone
To run the spark-shell or pyspark application on Spark Standalone, use the --master http://spark_master:spark_master_port flag when you start the application.
Submitting Spark Applications to Spark Standalone
To submit a Spark application on Spark Standalone, supply the --master and --deploy-mode client arguments to spark-submit.
Example: Running SparkPi on Spark Standalone
- CDH 5.2 and lower:
spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client \ --master http://spark_master:spark_master_port \ SPARK_HOME/examples/lib/spark-examples.jar 10
- CDH 5.3 and higher:
spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client \ --master http://spark_master:spark_master_port \ SPARK_HOME/lib/spark-examples.jar 10