Oozie High Availability
In CDH 5, you can configure multiple active Oozie servers against the same database, providing high availability for Oozie. This is supported in both MRv1 or MRv2 (YARN).
Requirements
The requirements for Oozie high availability are:- An external database that supports multiple concurrent connections. The default Derby database does not support multiple concurrent connections. In addition, the database should be configured for HA (for example Oracle RAC). If the database is not HA and fails, all Oozie servers will stop working. HA will still work with a non-HA database, but then the database then becomes the single point of failure.
- On all hosts where Oozie servers are going to run, the JDBC JAR should be placed in /var/lib/oozie/ or in the location referenced by the environment variables, for example, CLOUDERA_ORACLE_CONNECTOR_JAR, if using Oracle.
- ZooKeeper, which is used for distributed locks to coordinate the Oozie servers accessing the database at the same time and service discovery so that the Oozie servers can locate each other for log aggregation.
- A load balancer that
- A load balancer (preferably with HA support, for example HAProxy), Virtual IP, or Round-Robin DNS, to provide a single entry-point for users so they don’t have to choose between, or even be aware of, multiple Oozie servers and for callbacks from the ApplicationMaster or JobTracker
- Receives callbacks from JobTracker when a job is done. Callbacks are best-effort and used as “hints”, so eventually, default is ≤ 10min, the other Oozie servers would go and contact the JobTracker regardless of whether or not the callback went through and nothing would be lost or stuck. The load balancer should be HA as well. The load balancer should configured for round robin and not take into account the actual load on any of the Oozie servers.
For information on setting up SSL communication with Oozie HA enabled, see
Additional Considerations when Configuring SSL for Oozie HA.
Enabling Oozie High Availability Using Cloudera Manager
Minimum Required Role: Full Administrator
Enabling Oozie High Availability
- Ensure that the requirements are satisfied.
- In the Cloudera Manager Admin Console, go to the Oozie service.
- Select . A screen showing the hosts that are eligible to run an additional Oozie server displays. The host where the current Oozie server is running is not available as a choice.
- Select the host where you want the additional Oozie server to be installed, and click Continue.
- Specify the host and port of the Oozie load balancer, and click Continue. Cloudera Manager executes a set of commands that stops Oozie servers, add another Oozie server, initializes the Oozie server High Availability state in ZooKeeper, configures Hue to reference the Oozie load balancer, and restarts the Oozie servers and dependent services.
Disabling Oozie High Availability
- In the Cloudera Manager Admin Console, go to the Oozie service.
- Select . A screen showing the hosts running the Oozie servers displays.
- Select which Oozie server (host) you want to remain as the single Oozie server, and click Continue. Cloudera Manager executes a set of commands that stop the Oozie service, removes the additional Oozie servers, configures Hue to reference the Oozie service, and restarts the Oozie service and dependent services.
Enabling Oozie High Availability Using the Command Line
For more information, and installation and configuration instructions for configuring Oozie HA using the command line, see https://archive.cloudera.com/cdh5/cdh/5/oozie.