Running Spark Applications on Spark Standalone

In CDH 5, Cloudera recommends running Spark applications on a YARN cluster manager instead of on a Spark Standalone cluster manager, for the following benefits:

You can dynamically share and centrally configure the same pool of cluster resources among all frameworks that run on YARN.
You can use all the features of YARN schedulers for categorizing, isolating, and prioritizing workloads.
You choose the number of executors to use; in contrast, Spark Standalone requires each application to run an executor on every host in the cluster.
Spark can run against Kerberos enabled Hadoop clusters and use secure authentication between its processes.

Running a Spark Shell Application on Spark Standalone

To run the spark-shell or pyspark application on Spark Standalone, use the --master http://spark_master:spark_master_port flag when you start the application.

Submitting Spark Applications to Spark Standalone

To submit a Spark application on Spark Standalone, supply the --master and --deploy-mode client arguments to spark-submit.

Example: Running SparkPi on Spark Standalone

CDH 5.2 and lower:

spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client \
--master http://spark_master:spark_master_port \
SPARK_HOME/examples/lib/spark-examples.jar 10

CDH 5.3 and higher:

spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client \
--master http://spark_master:spark_master_port \
SPARK_HOME/lib/spark-examples.jar 10

The argument passed after the JAR controls how close to Pi the approximation should be.

Categories: Administrators | Data Analysts | Developers | Spark | All Categories

Running Spark Applications on YARN

Running a Crunch Application with Spark