Configuring high availability for SHS with an external load balancer

Learn how to configure high availability for Spark History Server (SHS) using an external load balancer such as HAProxy. The authentication method used is SPNEGO (Kerberos) which requires an external load balancer.

Download and configure an external load balancer. If you do not have a load-balancing proxy, you can experiment your configurations with HAProxy which is a free open-source load balancer. HAProxy is not a CHD component, and Cloudera does not provide support for HAProxy. Ensure that SPNEGO authentication is enabled in SHS.
  1. In Cloudera Manager, navigate to Spark > Configuration and configure the history_server_load_balancer_url property.
    The history_server_load_balancer_url property configures the following automatically:
    • The load balancer SPNEGO principal is automatically generated and added to the keytab file. For example: HTTP/loadbalancer.example.com/EXAMPLE.com
    • The value of SPNEGO_PRINCIPAL is automatically set to * , and all HTTP principals are loaded from the automatic generated keytab
    • The value of spark.yarn.historyServer.address is set to this URL in the appropriate gateway files:
      • Spark2: /etc/spark/conf/spark-defaults.conf
      • Spark3: /etc/spark3/conf/spark-defaults.conf
  2. Use the following test commands for this configuration:
    curl --negotiate -u : --location-trusted -c cookies.dat -b cookies.dat -k "https://loadbalancer.example.com:18488"
    curl --negotiate -u : --location-trusted -c cookies.dat -b cookies.dat -k "https://loadbalancer.example.com:18489"
This is the sample haproxy.cfg file that isused in the example below:
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 5
    timeout http-request    10s
    timeout queue           1m
    timeout connect         3s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

#---------------------------------------------------------------------
# Spark2 frontend which proxys to the Spark2 backends
#---------------------------------------------------------------------
frontend                        spark_front
    bind                        *:18488 ssl crt /var/lib/cloudera-scm-agent/agent-cert/cdep-host_key_cert_chain_decrypted.pem
    default_backend             spark2

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend spark2
    balance                     source
    server spark2-1 shs1.example.com:18488 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
    server spark2-2 shs2.example.com:18488 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem

#---------------------------------------------------------------------
# Spark3 frontend which proxys to the Spark3 backends
#---------------------------------------------------------------------
frontend                        spark3_front
    bind                        *:18489 ssl crt /var/lib/cloudera-scm-agent/agent-cert/cdep-host_key_cert_chain_decrypted.pem
    default_backend             spark3

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend spark3
    balance                     source
    server spark3-1 shs3.example.com:18489 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
    server spark3-2 shs4.example.com:18489 check ssl ca-file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem