Running Apache Spark ApplicationsPDF version

Spark on YARN deployment modes

In YARN, each application instance has an ApplicationMaster process, which is the first container started for that application. The application is responsible for requesting resources from the ResourceManager. Once the resources are allocated, the application instructs NodeManagers to start containers on its behalf. ApplicationMasters eliminate the need for an active client: the process starting the application can terminate, and coordination continues from a process managed by YARN running on the cluster.

In cluster mode, the Spark driver runs in the ApplicationMaster on a cluster host. A single process in a YARN container is responsible for both driving the application and requesting resources from YARN. The client that launches the application does not need to run for the lifetime of the application.

Cluster mode is not well suited to using Spark interactively. Spark applications that require user input, such as spark-shell and pyspark, require the Spark driver to run inside the client process that initiates the Spark application.

In client mode, the Spark driver runs on the host where the job is submitted. The ApplicationMaster is responsible only for requesting executor containers from YARN. After the containers start, the client communicates with the containers to schedule work.

Table 1. Deployment Mode Summary
Mode YARN Client Mode YARN Cluster Mode
Driver runs in Client ApplicationMaster
Requests resources ApplicationMaster ApplicationMaster
Starts executor processes YARN NodeManager YARN NodeManager
Persistent services YARN ResourceManager and NodeManagers YARN ResourceManager and NodeManagers
Supports Spark Shell Yes No