Chapter 4. Developing Spark Applications
Apache Spark is designed for fast application development and fast processing. Spark Core is the underlying execution engine; other services such as Spark SQL, MLlib, and Spark Streaming are built on top of the Spark Core.
To run Spark applications, use the spark-submit
script in the Spark
bin
directory to launch applications on a cluster. Alternately, to use the API
interactively you can launch an interactive shell for Scala (spark-shell
), Python
(pyspark
), or SparkR. Note: Each interactive shell automatically creates
SparkContext
in a variable called sc
.
For more information about getting started with Spark, see the Apache Spark Quick Start. For more extensive information about application development, see the Apache Spark Programming Guide and Submitting Applications.
The remainder of this chapter contains basic coding examples. Subsequent chapters describe how to access a range of data sources and analytic capabilities.