Spark cluster execution overview

Spark orchestrates its operations through the driver program. When the driver program is run, the Spark framework initializes executor processes on the cluster hosts that process your data. The following occurs when you submit a Spark application to a cluster:

  1. The driver is launched and invokes the main method in the Spark application.
  2. The driver requests resources from the cluster manager to launch executors.
  3. The cluster manager launches executors on behalf of the driver program.
  4. The driver runs the application. Based on the transformations and actions in the application, the driver sends tasks to executors.
  5. Tasks are run on executors to compute and save results.
  6. If dynamic allocation is enabled, after executors are idle for a specified period, they are released.
  7. When driver's main method exits or calls SparkContext.stop, it terminates any outstanding executors and releases resources from the cluster manager.