Load balancing between Hue and Impala

Condition

You see the following error when running Impala queries from Hue: "Invalid query handle error or Result have expired , rerun the query if needed". You also see either of the following errors in the runcpserver.log file:
  • Invalid query handle
  • Invalid session id

Cause

Hue uses a TCP connection pool (10 connections) for all Thrift traffic to Impala. This means that each Impala session is not guaranteed to use the same TCP connection. Load balancers send a single TCP connection to a single Impalad, but without correct persistence, Impala sessions can be sent to the wrong backend server, causing the errors you see.

Solution

To solve this issue, you must configure your load balancer that is between Hue and Impala to use Source IP persistence. This is not the load balancer in front of Hue on port 8888/8889, this is the load balancer for Impala, defined in the Impala configuration in Cloudera Manager as Impala Daemons Load Balancer. In addition to Source IP persistence, you must also set the timeout in the load balancer for these connections to a bigger value, otherwise the load balancer can close these connections even though Hue is using them and thinks they are active. Cloudera recommends a minium of 6 hours as the timeout value. 12 hours is ideal.

Cloudera also recommends that you split the VIP configurations into 3 different ports, 21000 for impala-shell users, 21050 for JDBC users and then 21051 for Hue instances. This way you only have to configure the high timeout and Source IP persistence for the Hue port 21051.

Configure the HA Proxy as follows:

  1. Open a terminal session and SSH in to the Impala Daemon.
  2. Download and install an HAproxy service by running a command based on your operating system. For example:
    yum install haproxy
  3. Configure HAProxy for each role as follows:
    vi /etc/haproxy/haproxy.cfg
    # For impala-shell users on port 21000.
    #---------------------------------------------------------------------
    # main frontend which proxys to the backends
    #---------------------------------------------------------------------
    frontend  impala_front
        bind                 *:21000 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
        mode                 tcp
        option               tcplog
        default_backend      impala-shell
    
    #---------------------------------------------------------------------
    # round robin balancing between the various backends
    #---------------------------------------------------------------------
    backend impala-shell
        balance              leastconn
        mode                 tcp
        server impalad1 impalad-1.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
        server impalad2 impalad-2.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
        server impalad3 impalad-3.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
    
    
    # For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
    #---------------------------------------------------------------------
    # main frontend which proxys to the backends
    #---------------------------------------------------------------------
    frontend  impala_front
        bind                 *:21050 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
        mode                 tcp
        option               tcplog
        default_backend      impala-jdbc
    
    #---------------------------------------------------------------------
    # round robin balancing between the various backends
    #---------------------------------------------------------------------
    backend impala-jdbc
        balance              leastconn
        mode                 tcp
        server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
        server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
        server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
    
    
    # Setup for Hue or other JDBC-enabled applications.
    # In particular, Hue requires SOURCE IP PERSISTANCE
    # The application connects to load_balancer_host:21051, and HAProxy balances
    # connections to the associated hosts, where Impala listens for JDBC
    # requests on port 21050.
    # Notice the timeouts below that do not exist in the other configs
    # these are to stop the connections from being killed even though
    # hue is using them
    #---------------------------------------------------------------------
    # main frontend which proxys to the backends
    #---------------------------------------------------------------------
    frontend  impalajdbc_front
        bind                 *:21051 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
        mode                 tcp
        option               tcplog
      timeout client    720m
      timeout server    720m
        default_backend      impala-hue
    
    #---------------------------------------------------------------------
    # source balancing between the various backends
    #---------------------------------------------------------------------
    backend impala-hue
        balance              source
        mode                 tcp
        server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
        server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
        server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
  4. Create a persistence profile which enables source IP persistence in the F5. Make sure the timeout value is between 6 and 12 hours.
    ltm persistence source-addr source_addr_12h_idle_timeout {
      app-service none
      defaults-from source_addr
      hash-algorithm default
      map-proxies enabled
      mask none
      match-across-pools disabled
      match-across-services disabled
      match-across-virtuals disabled
      override-connection-limit disabled
      timeout 43200
    }
  5. Create virtual profiles for each VIP, impala-shell, impala JDBC, and impala Hue.
    impala-shell profile:
    
    ltm virtual vs-impala-21000 {
      description "Impala 21000 for Shell"
      destination <VIPIP>:21000
      ip-protocol tcp
      mask 255.255.255.255
      pool pool-impala-21000
      profiles {
        fastL4 { }
      }
      source 0.0.0.0/0
      source-address-translation {
        type automap
      }
      translate-address enabled
      translate-port enabled
      vs-index 30
    }
    
    impala JDBC profile:
    
    ltm virtual vs-impala-jdbc-21050 {
      description "Impala 21050 for JDBC"
      destination <VIPIP>:21050
      ip-protocol tcp
      mask 255.255.255.255
      pool pool-impala-21050
      profiles {
        fastL4 { }
      }
      source 0.0.0.0/0
      source-address-translation {
        type automap
      }
      translate-address enabled
      translate-port enabled
      vs-index 31
    }
    
    impala Hue profile:
    
    ltm virtual vs-impala-hue-21051 {
      description "Impala 21051 for Hue"
      destination <VIPIP>:21051
      ip-protocol tcp
      mask 255.255.255.255
      persist {
        source_addr_12h_idle_timeout {
          default yes
        }
      }
      pool pool-impala-21050
      profiles {
        fastL4 { }
      }
      source 0.0.0.0/0
      source-address-translation {
        type automap
      }
      translate-address enabled
      translate-port enabled
      vs-index 32
    }