Apache Spark Component Guide
Also available as:
PDF
loading table of contents...

Chapter 4. Developing and Submitting Spark Applications

Apache Spark enables you to quickly develop applications and process jobs. It is designed for fast application development and fast processing. Spark Core is the underlying execution engine; other services, such as Spark SQL, MLlib, and Spark Streaming, are built on top of the Spark Core.

To launch Spark applications on a cluster, you typically use the spark-submit script in the Spark bin directory. You can also use the API interactively by launching an interactive shell for Scala (spark-shell), Python (pyspark), or SparkR. Note that each interactive shell automatically creates SparkContext in a variable called sc.

For more information about getting started with Spark, see the Apache Spark Quick Start. For more extensive information about application development, see the Apache Spark Programming Guide and Submitting Applications.

This chapter describes how to run two sample programs, followed by guidelines for applications that use the Spark DataFrame API, external libraries, Spark SQL, and Hive user-defined functions.