Spark on YARN deployment modes
In YARN, each application instance has an ApplicationMaster process, which is the first container started for that application. The application is responsible for requesting resources from the ResourceManager. Once the resources are allocated, the application instructs NodeManagers to start containers on its behalf. ApplicationMasters eliminate the need for an active client: the process starting the application can terminate, and coordination continues from a process managed by YARN running on the cluster.
Cluster Deployment Mode
Cluster mode is not well suited to using Spark interactively. Spark
applications that require user input, such as
spark-shell
and pyspark
, require the
Spark driver to run inside the client process that initiates the Spark
application.
Client Deployment Mode
Mode | YARN Client Mode | YARN Cluster Mode |
---|---|---|
Driver runs in | Client | ApplicationMaster |
Requests resources | ApplicationMaster | ApplicationMaster |
Starts executor processes | YARN NodeManager | YARN NodeManager |
Persistent services | YARN ResourceManager and NodeManagers | YARN ResourceManager and NodeManagers |
Supports Spark Shell | Yes | No |