Configuring high availability for SHS with multiple Knox Gateways

Learn how to configure high availability for Spark History Server (SHS) using an external load balancer such as HAProxy with multiple Apache Knox Gateway services. If you have only one Knox Gateway, this will be a single point of failure. For example, if the Knox Gateway goes down, and you are using the Knox Gateway’s URL to access the Spark History Server’s (SHS) user interface, it will fail.

You must have installed two or more Knox Gateway services in Cloudera Manager, and one external load balancer, for example, HAProxy.
The following is an example HAProxy configuration for Knox Gateways. This is the sample haproxy.cfg file that is used in this example:
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 5
    timeout http-request    10s
    timeout queue           1m
    timeout connect         3s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

frontend knox-frontend
    bind :8443 ssl crt /etc/haproxy/cm-auto-host_cert_chain_unenckey.pem
    mode http
    stats enable
    option forwardfor
    http-request redirect location https://%[req.hdr(Host)]/gateway/homepage/home/ if { path / }
    default_backend knox-backend

backend knox-backend
    mode http
    option redispatch
    balance leastconn
    option forwardfor
    stick-table type ip size 1m expire 24h
    stick on src
    option httpchk HEAD /gateway/knoxsso/knoxauth/login.html HTTP/1.1\r\nHost:\ hwx.site
    http-check expect status 200
    # Knox nodes
    server shs1.example.com shs1.example.com:8443 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
    server shs2.example.com shs2.example.com:8443 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
    server shs3.example.com shs3.example.com:8443 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
    server shs4.example.com shs4.example.com:8443 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
    server shs5.example.com shs5.example.com:8443 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem

frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 1s

  1. Add the following HAProxy’s URL to the Spark gateway configurations:
    Spark2: Cloudera Manager >Spark>Configuration>history_server_load_balancer_url https://loadbalancer.example.com:8443/gateway/cdp-proxy/sparkhistory/
    Spark3: Cloudera Manager > Spark3 > Configuration > history_server_load_balancer_url https://loadbalancer.example.com:8443/gateway/cdp-proxy/spark3history/
  2. Use these test commands:
    curl --basic --location-trusted -c cookies.dat -b cookies.dat -k -u username:password "https://loadbalancer.example.com:8443/gateway/cdp-proxy/sparkhistory/”
    
    
    curl --basic --location-trusted -c cookies.dat -b cookies.dat -k -u username:password "https://loadbalancer.example.com:8443/gateway/cdp-proxy/spark3history/”
  3. Select the Knox-1 option on the Spark configuration page so that the following configurations will be automatically added after the “Deploy Client Configuration” section:
    /etc/spark/conf/spark-defaults.conf:
    
    
    spark.yarn.historyServer.address=https://knox.example.com:8443/gateway/cdp-proxy/sparkhistory/
    
    
    /etc/spark3/conf/spark-defaults.conf
    
    
    spark.yarn.historyServer.address=https://knox.example.com:8443/gateway/cdp-proxy/spark3history/
    
    
    This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI.