1. How ZooKeeper Manages HiveServer2 Requests

Multiple HiveServer2 (HS2) instances can register themselves with Zookeeper and then the client (client driver) can find a HS2 through Zookeeper. When a client requests an HS2 instance, ZooKeeper returns one randomly-selected registered HS2.

This enables the following scenarios:

  • High Availability

    If more than one HS2 instance is registered with ZooKeeper, and all instances fail except one, ZooKeeper passes the link to the instance that is running and the client can connect successfully. (Failed instances must be restarted manually.)

  • Load Balancing

    If there is more than one HS2 instance registered with ZooKeeper, ZooKeeper responds to client requests by randomly passing a link to one of the HS2 instances. This ensures that all HS2 instances get roughly the same workload.

  • Rolling Upgrade

    It is possible to register HS2 instances based on their version, configuring HS2s from the new version as active and HS2s from the old version as passive. ZooKeeper passes connections only to active HS2s, so over time the old version participates less and less in existing sessions. Ultimately, when the old version is no longer participating at all, it can be removed.

    For further information, see Rolling Upgrade Guide.

Not handled:

  • Automatic Failover

    If an HS2 instance failed while a client is connected, the session is lost. Since this situation need to be handed at the client, there is no automatic failover; the client needs to reconnect using ZooKeeper.