Deploying Oozie with an HA Cluster
You can configure multiple Oozie servers against the same database to provide High Availability (HA) for the Oozie service.
You need the following prerequisites:
- A database that supports multiple concurrent connections. In order to have full
HA, the database should also have HA support, or it becomes a single point of failure.
Note
The default derby database does not support this.
- A ZooKeeper ensemble. Apache ZooKeeper is a distributed, open-source coordination service for distributed applications; the Oozie servers use it for coordinating access to the database and communicating with each other. In order to have full HA, there should be at least 3 ZooKeeper servers.
- Multiple Oozie servers.
Important
While not strictly required, you should configure all ZooKeeper servers to have identical properties.
-
A Loadbalancer, Virtual IP, or Round-Robin DNS. This is used to provide a single
entry-point for users and for callbacks from the JobTracker.
The load balancer should be configured for round-robin between the Oozie servers to distribute the requests. Users (using either the Oozie client, a web browser, or the REST API) should connect through the load balancer. In order to have full HA, the load balancer should also have HA support, or it becomes a single point of failure.