Configuring high availability for SHS with an internal load balancer

Learn how to configure high availability for Spark History Server (SHS) using an internal load balancer. The authentication method for the internal load balancer uses a username and password through Apache Knox Gateway. The Cloudera Distributed Hadoop (CDH) stack includes the Apache Knox Gateway which has a built-in load balancer and failover mechanism.

One Knox service must be installed in Cloudera Manager.
This internal load balancer is only recommended for testing purposes because only one Knox Gateway is a single point of failure.
The following Knox topology configuration is automatically generated if there are two Spark History Server clusters in a Cloudera Manager cluster:
knox.example.com:/var/lib/knox/gateway/conf/topologies/cdp-proxy.xml  


    <param>
       <name>SPARK3HISTORYUI</name>
       <value>enabled=true;maxFailoverAttempts=3;failoverSleep=1000</value>
    </param>
    <param>
       <name>SPARKHISTORYUI</name>
       <value>enabled=true;maxFailoverAttempts=3;failoverSleep=1000</value>
    </param>


    <service>
        <role>SPARK3HISTORYUI</role>
        <url>https://shs1.example.com:18489</url>
        <url>https://shs2.example.com:18489</url>
    </service>


    <service>
        <role>SPARKHISTORYUI</role>
        <url>https://shs3.example.com:18488</url>
        <url>https://shs4.example.com:18488</url>
    </service>
  1. To use the Knox load balancing feature, you must use the Knox Gateway URL. If one of the Spark History Servers is down, the connection will be automatically redirected to the other server. See the example Knox Gateway URLs below:
    Spark2: https://knox.example.com:8443/gateway/cdp-proxy/sparkhistory/
    Spark3: https://knox.example.com:8443/gateway/cdp-proxy/spark3history/
  2. Use these test commands:
    curl --basic --location-trusted -c cookies.dat -b cookies.dat -k -u username:password "https://knox.example.com:8443/gateway/cdp-proxy/sparkhistory/”
    
    
    curl --basic --location-trusted -c cookies.dat -b cookies.dat -k -u username:password "https://knox.example.com:8443/gateway/cdp-proxy/spark3history/”
    
  3. Select the knox_service option on the Spark configuration page to add the following configurations after the “Deploy Client Configuration” action.
    /etc/spark/conf/spark-defaults.conf:
    
    
    spark.yarn.historyServer.address=https://knox.example.com:8443/gateway/cdp-proxy/sparkhistory/
    
    
    /etc/spark3/conf/spark-defaults.conf
    
    
    spark.yarn.historyServer.address=https://knox.example.com:8443/gateway/cdp-proxy/spark3history/
    
    
    This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI.