Apache Spark Component Guide
Also available as:
PDF
loading table of contents...

Chapter 4. Running Spark

You can run Spark interactively or from a client program:

  • Submit interactive statements through the Scala, Python, or R shell, or through a high-level notebook such as Zeppelin.

  • Use APIs to create a Spark application that runs interactively or in batch mode, using Scala, Python, R, or Java.

To launch Spark applications on a cluster, you can use the spark-submit script in the Spark bin directory. You can also use the API interactively by launching an interactive shell for Scala (spark-shell), Python (pyspark), or SparkR. Note that each interactive shell automatically creates SparkContext in a variable called sc. For more informationa about spark-submit, see the Apache Spark document Submitting Applications.

Alternately, you can use Livy to submit and manage Spark applications on a cluster. Livy is a Spark service that allows local and remote applications to interact with Apache Spark over an open source REST interface. Livy offers additional multi-tenancy and security functionality. For more information about using Livy to run Spark Applications, see Submitting Spark Applications through Livy.

This chapter describes how to specify Spark version for a Spark application, and how to run Spark 1 and Spark 2 sample programs.