Cause
Hue uses a TCP connection pool (10 connections) for all Thrift traffic to Impala.
This means that each Impala session is not guaranteed to use the same TCP
connection. Load balancers send a single TCP connection to a single Impalad, but
without correct persistence, Impala sessions can be sent to the wrong backend
server, causing the errors you see.
Solution
To solve this issue, you must configure your load balancer that is
between Hue and Impala to use Source IP persistence. This is not the
load balancer in front of Hue on port 8888/8889, this is the load
balancer for Impala, defined in the Impala configuration in Cloudera
Manager as Impala Daemons Load Balancer. In addition to Source IP
persistence, you must also set the timeout in the load balancer for
these connections to a bigger value, otherwise the load balancer can
close these connections even though Hue is using them and thinks they
are active. Cloudera recommends a minium of 6 hours as the timeout
value. 12 hours is ideal.
Cloudera also recommends that you split the VIP configurations into 3
different ports, 21000 for impala-shell users, 21050 for JDBC users and
then 21051 for Hue instances. This way you only have to configure the
high timeout and Source IP persistence for the Hue port 21051.
Configure the HA Proxy as follows:
-
Open a terminal session and SSH in to the Impala Daemon.
-
Download and install an HAproxy service by running a command based on
your operating system. For example:
-
Configure HAProxy for each role as follows:
vi /etc/haproxy/haproxy.cfg
# For impala-shell users on port 21000.
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend impala_front
bind *:21000 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
mode tcp
option tcplog
default_backend impala-shell
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend impala-shell
balance leastconn
mode tcp
server impalad1 impalad-1.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad2 impalad-2.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad3 impalad-3.example.com:21000 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend impala_front
bind *:21050 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
mode tcp
option tcplog
default_backend impala-jdbc
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend impala-jdbc
balance leastconn
mode tcp
server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
# Setup for Hue or other JDBC-enabled applications.
# In particular, Hue requires SOURCE IP PERSISTANCE
# The application connects to load_balancer_host:21051, and HAProxy balances
# connections to the associated hosts, where Impala listens for JDBC
# requests on port 21050.
# Notice the timeouts below that do not exist in the other configs
# these are to stop the connections from being killed even though
# hue is using them
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend impalajdbc_front
bind *:21051 ssl crt /opt/cloudera/security/x509/certkeynopw.pem
mode tcp
option tcplog
timeout client 720m
timeout server 720m
default_backend impala-hue
#---------------------------------------------------------------------
# source balancing between the various backends
#---------------------------------------------------------------------
backend impala-hue
balance source
mode tcp
server impalad1 impalad-1.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad2 impalad-2.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
server impalad3 impalad-3.example.com:21050 check ssl ca-file /opt/cloudera/security/truststore/ca-truststore.pem
-
Create a persistence profile which enables source IP persistence in the
F5. Make sure the timeout value is between 6 and 12 hours.
ltm persistence source-addr source_addr_12h_idle_timeout {
app-service none
defaults-from source_addr
hash-algorithm default
map-proxies enabled
mask none
match-across-pools disabled
match-across-services disabled
match-across-virtuals disabled
override-connection-limit disabled
timeout 43200
}
-
Create virtual profiles for each VIP, impala-shell, impala JDBC, and
impala Hue.
impala-shell profile:
ltm virtual vs-impala-21000 {
description "Impala 21000 for Shell"
destination <VIPIP>:21000
ip-protocol tcp
mask 255.255.255.255
pool pool-impala-21000
profiles {
fastL4 { }
}
source 0.0.0.0/0
source-address-translation {
type automap
}
translate-address enabled
translate-port enabled
vs-index 30
}
impala JDBC profile:
ltm virtual vs-impala-jdbc-21050 {
description "Impala 21050 for JDBC"
destination <VIPIP>:21050
ip-protocol tcp
mask 255.255.255.255
pool pool-impala-21050
profiles {
fastL4 { }
}
source 0.0.0.0/0
source-address-translation {
type automap
}
translate-address enabled
translate-port enabled
vs-index 31
}
impala Hue profile:
ltm virtual vs-impala-hue-21051 {
description "Impala 21051 for Hue"
destination <VIPIP>:21051
ip-protocol tcp
mask 255.255.255.255
persist {
source_addr_12h_idle_timeout {
default yes
}
}
pool pool-impala-21050
profiles {
fastL4 { }
}
source 0.0.0.0/0
source-address-translation {
type automap
}
translate-address enabled
translate-port enabled
vs-index 32
}