About Atlas High Availability

Atlas supports multiple instances of the service in an active-passive configuration.

You can deploy multiple instances of the service on different physical hosts in a cluster. One of these instances is automatically selected as an 'active' instance to respond to user requests. All other instances are deemed 'passive'. If the 'active' instance is unavailable either because it is deliberately stopped, or due to an unexpected failure, one of the other instances automatically changes to an 'active' mode and starts servicing requests.

An 'active' instance is the only instance that can respond to user requests. A 'passive' instance redirects requests coming to it directly (using HTTP redirect) to the current 'active' instance. A passive instance does not service any requests. However, all instances (both active and passive), respond to admin status requests that return heartbeat information about that instance.

How Atlas High Availability works

Atlas uses Zookeeper to know about all available Atlas service instances within the cluster. It uses Zookeeper’s electoral election to determine the instance that needs to be active. Once the leader is elected, other instances set themselves as passive and do not participate in servicing requests. Atlas thus achieves active-passive configuration for high availability (HA). The active instance is also the one that processes the notification messages from the hook.

When an instance goes down or is turned off, Zookeeper’s electoral election starts and elects a new leader. ZooKeeper ensures that one instance is always available for processing requests.

The other server performs the real (active) work only when the “active” server goes down.

Currently, the only way to change the mode from active to passive is to shutdown the active mode; the other server instance receives a message (through ZooKeeper) to make itself active.