Running Apache Spark Applications
Also available as:
PDF

Introduction

You can run Spark interactively or from a client program:

  • Submit interactive statements through the Scala, Python, or R shell, or through a high-level notebook such as Zeppelin.

  • Use APIs to create a Spark application that runs interactively or in batch mode, using Scala, Python, R, or Java.

To launch Spark applications on a cluster, you can use the spark-submit script in the Spark bin directory. You can also use the API interactively by launching an interactive shell for Scala (spark-shell), Python (pyspark), or SparkR. Note that each interactive shell automatically creates SparkContext in a variable called sc. For more informationa about spark-submit, see the Apache Spark document "Submitting Applications".

Alternately, you can use Livy to submit and manage Spark applications on a cluster. Livy is a Spark service that allows local and remote applications to interact with Apache Spark over an open source REST interface. Livy offers additional multi-tenancy and security functionality. For more information about using Livy to run Spark Applications, see "Submitting Spark Applications through Livy" in this guide.