Using Spark History Servers with high availability

You can configure the load balancer for Spark History Server (SHS) to ensure high availability, so that users can access and use the Spark History Server UI without any disruption. Learn how you can configure the load balancer for SHS and the limitations associated with it.

You can access the Spark History Server for your Spark cluster from the Cloudera Data Platform (CDP) Management Console interface. The Spark History Server (SHS) has two main functions:
  • Reads the Spark event logs from the storage and displays them on the Spark History Server's user interface.
  • Cleans the old Spark event log files.
Introducing high availability enables an extra load balancing layer so that users can see the Spark History Server's user interface without failure or disruption when there are two Spark History Servers in a cluster. Additionally, users are able to use the rolling restart feature for Spark History Server.

There are three supported ways to configure the load balancer for Spark History Server:

  • Using an external load balancer for example, HAProxy.
  • Using an internal load balancer which requires Apache Knox Gateway.
  • Using multiple Apache Knox Gateways and external load balancers, for example, HAProxy.

.