Configure Hue for High Availability

Configuring Hue for High Availability (HA) means configuring Hue, Hive, and Impala.

Configure Hue for High Availability

Prerequisites

  • SSH network access to host machines with an Hue Server/Kerberos Ticket Renewer role.
  • External database configured for each Hue Server. See Hue Databases.

Add Hue Roles

Hue HA requires at least two Hue server roles and one Load Balancer role. If the cluster is authenticating with Kerberos, you need one Kerberos Ticket Renewer on each host with a Hue Server.

  1. Log on to Cloudera Manager and go to the Hue service.
  2. Go to the Hue service and select Actions > Add Role Instances.
  3. Click Hue Server, assign to one or more hosts, and click OK > Continue.
  4. Click Kerberos Ticket Renewer, assign to each host with a Hue Server, and click OK > Continue.
  5. Click Load Balancer, assign to one or more hosts, and click OK > Continue.
  6. Check each role and select Actions for Selected > Start and click Start.

Enable TLS for Hue Load Balancer

  1. Go to Hue > Configuration and search on TLS/SSL.
  2. Check Enable TLS/SSL for Hue for the Hue Server Default Group.
  3. Set other TLS/SSL properties appropriate for your setup. Some to consider are:
    • Hue Load Balancer Port - Apache Load Balancer listens on this port (default is 8889).
    • Path to TLS/SSL Certificate File - Must be multi-domain with CN = Load Balancer in PEM format.
    • Path to TLS/SSL Private Key File - Must be in PEM format.
  4. Click Save Changes and Restart Hue.

Configure Hive and Impala for High Availability

Prerequisites & Requirements

  • SSH network access to host machines with a HiveServer2 or Impala Daemon role.
  • External database configured for each H2S and Impala Daemon.
  • Hue Load Balancer Hive/Impala Load Balancer configured with Source IP Persistence.

Source IP Persistence

Without IP Persistence, you may encounter the error, “Results have expired, rerun the query if needed.

Hue supports High Availability through a "load balancer" to HiveServer2 and Impala. Because the underlying Hue thrift libraries reuse TCP connections in a pool, a single user session may not have the same TCP connection. If a TCP connection is balanced away from a HiveServer2 or Impalad instance, the user session and its queries (running or returned) can be lost and trigger the “Results have expired" error.

To prevent sessions from being lost, configure the Hive/Impala Load Balancer with Source IP Persistence so that each Hue instance sends all traffic to a single HiveServer2/Impala instance. Of course, this is not true load balancing, but a configuration for failover High Availability.

To prevent sessions from timing out while in use, add more Hue Server instances, so that each can be pinned to another HiveServer2/Impala instance. And for both HiveServer2/Impala, set the affinity timeout (that is, the timeout to close persisted sessions) to be longer than the impala query and session timeouts.

For the best load distribution, create multiple profiles in your load balancer, per port, for both non-Hue clients and Hue clients. Have non-Hue clients distribute loads in a round robin and configure Hue clients with source IP Persistence on dedicated ports, for example, 21000 for impala-shell, 21050 for impala-jdbc, and 21051 for Hue.

Add Hive and Impala Roles

In Cloudera Manager, add roles for HiveServer2 and Impala Daemon (like Add Hue Roles):
  1. Configure the cluster with at least two roles for HiveServer2:
    1. Go to the Hive service and select Actions > Add Role Instances.
    2. Click HiveServer2, assign one or more hosts, and click OK > Continue.
    3. Check each role and select Actions for Selected > Start and click Start.
  2. Configure the cluster with at least two roles for Impala Daemon:
    1. Go to the Impala service and select Actions > Add Role Instances.
    2. Click Impala Daemon, assign one or more hosts, and click OK > Continue.
    3. Check each role and select Actions for Selected > Start and click Start.

Install Proxy Service

This is an example of how to add a proxy server for each HiveServer2 and Impala Daemon with multiple profiles.

  1. Install haproxy (for either RHEL / Ubuntu / SLES):
    yum install haproxy
    apt-get install haproxy
    zypper addrepo http://download.opensuse.org/repositories/server:http/SLE_12/server:http.repo
    zypper refresh
    zypper install haproxy
  2. Configure haproxy for each role, for example:
    vi /etc/haproxy/haproxy.cfg
    listen impala-shell
        bind :21001
        mode tcp
        option tcplog
        balance roundrobin
        stick-table type ip size 20k expire 5m
    server impala_0 host shortname-2.domain:21000 check
    server impala_1 host shortname-3.domain:21000 check
    
    listen impala-jdbc
        bind :21051
        mode tcp
        option tcplog
        balance roundrobin
        stick-table type ip size 20k expire 5m
    server impala_0 host shortname-2.domain:21050 check
    server impala_1 host shortname-3.domain:21050 check
    
    listen impala-hue
        bind :21052
        mode tcp
        option tcplog
        balance source
    server impala_0 host shortname-2.domain:21050 check
    server impala_1 host shortname-3.domain:21050 check
    
    listen hiveserver2-jdbc
        bind :10001
        mode tcp
        option tcplog
        balance roundrobin
        stick-table type ip size 20k expire 5m
    server hiveserver2_0 host shortname-1.domain:10000 check
    server hiveserver2_1 host shortname-2.domain:10000 check
    
    listen hiveserver2-hue
        bind :10002
        mode http
        option tcplog
        balance source
    server hiveserver2_0 host shortname-1.domain:10000 check
    server hiveserver2_1 host shortname-2.domain:10000 check
    Replace shortname-#.domain with those in your environment:
    sed -i "s/host shortname/your host shortname/g" /etc/haproxy/haproxy.cfg
    sed -i "s/domain/your domain/g" /etc/haproxy/haproxy.cfg
  3. Restart haproxy:
    service haproxy restart
  4. Run netstat to ensure your proxies are running:
    netstat | grep LISTEN