Hadoop High Availability
Also available as:
PDF
loading table of contents...

Deploy Oozie with an HA Cluster

You can configure multiple Oozie servers against the same database to provide High Availability (HA) for the Oozie service. You need the following prerequisites:

  • A database that supports multiple concurrent connections. In order to have full HA, the database should also have HA support, or it becomes a single point of failure.

    [Note]Note

    The default derby database does not support this.

  • A ZooKeeper ensemble. Apache ZooKeeper is a distributed, open-source coordination service for distributed applications; the Oozie servers use it for coordinating access to the database and communicating with each other. In order to have full HA, there should be at least 3 ZooKeeper servers. Find more information about ZooKeeper here.

  • Multiple Oozie servers.

    [Important]Important

    While not strictly required, you should configure all ZooKeeper servers to have identical properties.

  • A Loadbalancer, Virtual IP, or Round-Robin DNS. This is used to provide a single entry-point for users and for callbacks from the JobTracker. The load balancer should be configured for round-robin between the Oozie servers to distribute the requests. Users (using either the Oozie client, a web browser, or the REST API) should connect through the load balancer. In order to have full HA, the load balancer should also have HA support, or it becomes a single point of failure. For information about how to set up your Oozie servers to handle failover, see Configuring Oozie Failover.