Example: Running SparkPi on YARN

These examples demonstrate how to use spark-submit to submit the SparkPi Spark example application with various options. In the examples, the argument passed after the JAR controls how close to pi the approximation should be.

In a CDP deployment, SPARK_HOME defaults to /opt/cloudera/parcels/CDH/lib/spark. The shells are also available from /bin.

Running SparkPi in YARN Cluster Mode

To run SparkPi in cluster mode:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn \
--deploy-mode cluster /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10

The command prints status until the job finishes or you press control-C. Terminating the spark-submit process in cluster mode does not terminate the Spark application as it does in client mode. To monitor the status of the running application, run yarn application -list.

Running SparkPi in YARN Client Mode

To run SparkPi in client mode:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn \
--deploy-mode client SPARK_HOME/lib/spark-examples.jar 10

Running Python SparkPi in YARN Cluster Mode

  1. Unpack the Python examples archive:
    sudo su gunzip SPARK_HOME/lib/python.tar.gz
    sudo su tar xvf SPARK_HOME/lib/python.tar
  2. Run the pi.py file:
    spark-submit --master yarn --deploy-mode cluster SPARK_HOME/lib/pi.py 10