Monitoring Spark Applications

Every Spark application launches a web UI, by default on port 4040, that displays useful information about the application. This includes:
  • A list of scheduler stages and tasks
  • A summary of RDD sizes and memory usage
  • Environmental information
  • Information about the running executors
You can access this interface by opening http://spark_driver_host:4040 in a web browser. If multiple applications are running on the same host, they will bind to successive ports beginning with 4040 (4041, 4042, and so on). This information is available only for the duration of the application.
To see information about all running Spark applications, do one of the following depending on which cluster manager you are using:
  • YARN - Go to the YARN applications page in the Cloudera Manager Admin Console.
  • Spark Standalone - Go to the Spark Master UI, by default at http://spark_master:18080.
For information on completed applications, go to the History Server, by default at http://spark_history_server:18088. In the Cloudera Manager Admin Console, you open the History Server UI as follows:
  1. Go to the Spark service.
  2. Click the History Server Web UI link.

For further information on Spark monitoring, see Monitoring and Instrumentation.