Chapter 4. Developing and Submitting Spark Applications
Apache Spark enables you to quickly develop applications and process jobs. It is designed for fast application development and fast processing. Spark Core is the underlying execution engine; other services, such as Spark SQL, MLlib, and Spark Streaming, are built on top of the Spark Core.
To launch Spark applications on a cluster, you typically use the spark-submit
script in the Spark bin
directory. You can also use the API interactively by
launching an interactive shell for Scala (spark-shell
), Python
(pyspark
), or SparkR. Note that each interactive shell automatically creates
SparkContext
in a variable called sc
.
For more information about getting started with Spark, see the Apache Spark Quick Start. For more extensive information about application development, see the Apache Spark Programming Guide and Submitting Applications.
This chapter describes how to run two sample programs, followed by guidelines for applications that use the Spark DataFrame API, external libraries, Spark SQL, and Hive user-defined functions.