Configuring Oozie

This section explains how to configure Oozie and it provides procedures for configuring the proper version of Oozie for new installations and after upgrades.

Configuring which Hadoop Version to Use

The Oozie client does not interact directly with Hadoop MapReduce, and so it does not require any MapReduce configuration.

The Oozie server can work with either MRv1 or YARN. It cannot work with both simultaneously.

You set the MapReduce version the Oozie server works with by means of the alternatives command (or update-alternatives, depending on your operating system). As well as distinguishing between YARN and MRv1, the commands differ depending on whether or not you are using SSL.
  • To use YARN (without SSL):
    alternatives --set oozie-tomcat-deployment /etc/oozie/tomcat-conf.http
  • To use YARN (with SSL):
    alternatives --set oozie-tomcat-deployment /etc/oozie/tomcat-conf.https
  • To use MRv1 (without SSL) :
    alternatives --set oozie-tomcat-deployment /etc/oozie/tomcat-conf.http.mr1
  • To use MRv1 (with SSL) :
    alternatives --set oozie-tomcat-deployment /etc/oozie/tomcat-conf.https.mr1
CAUTION:
Do this while the Oozie server is not running.

If you change the MapReduce version on an Oozie server running workflows that use the other version of MapReduce (the version you are changing from; for example MRv1) all those jobs will fail.

Configuring Oozie after Upgrading from an Earlier CDH 5 Release

Step 1: Update Configuration Files

  1. Edit the new Oozie CDH 5 oozie-site.xml, and set all customizable properties to the values you set in the previous oozie-site.xml.
  2. If necessary do the same for the oozie-log4j.properties, oozie-env.sh and the adminusers.txt files.

Step 2: Upgrade the Database

Oozie CDH 5 provides a command-line tool to perform the database schema and data upgrade. The tool uses Oozie configuration files to connect to the database and perform the upgrade.

The database upgrade tool works in two modes: it can do the upgrade in the database or it can produce an SQL script that a database administrator can run manually. If you use the tool to perform the upgrade, you must do it as a database user who has permissions to run DDL operations in the Oozie database.

  • To run the Oozie database upgrade tool against the database:
    You will see output such as this (the output of the script may differ slightly depending on the database vendor):
    Validate DB Connection
    DONE
    Check DB schema exists
    DONE
    Verify there are not active Workflow Jobs
    DONE
    Check OOZIE_SYS table does not exist
    DONE
    Get Oozie DB version
    DONE
    Upgrade SQL schema
    DONE
    Upgrading to db schema for Oozie 4.0.0-cdh5.0.0
    Update db.version in OOZIE_SYS table to 3
    DONE
    Converting text columns to bytea for all tables
    DONE
    Get Oozie DB version
    DONE
    
    Oozie DB has been upgraded to Oozie version '4.0.0-cdh5.0.0'
    
    The SQL commands have been written to: /tmp/ooziedb-8676029205446760413.sql
    
  • To create the upgrade script:
    $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -sqlfile SCRIPT
    For example:
    $ sudo -u oozie /usr/lib/bin/ooziedb.sh upgrade -sqlfile oozie-upgrade.sql
    You should see output such as the following (the output of the script may differ slightly depending on the database vendor):
    Validate DB Connection
    DONE
    Check DB schema exists
    DONE
    Verify there are not active Workflow Jobs
    DONE
    Check OOZIE_SYS table does not exist
    DONE
    Get Oozie DB version
    DONE
    Upgrade SQL schema
    DONE
    Upgrading to db schema for Oozie 4.0.0-cdh5.0.0
    Update db.version in OOZIE_SYS table to 3
    DONE
    Converting text columns to bytea for all tables
    DONE
    Get Oozie DB version
    DONE
    
    The SQL commands have been written to: oozie-upgrade.sql
    
    WARN: The SQL commands have NOT been executed, you must use the '-run' option

Step 3: Upgrade the Oozie Shared Library

The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:

  • The shared library file for YARN is oozie-sharelib-yarn.tar.gz.
  • The shared library file for MRv1 is oozie-sharelib-mr1.tar.gz.

To upgrade the shared library, proceed as follows.

  1. Delete the Oozie shared libraries from HDFS. For example:
    $ sudo -u oozie hadoop fs -rmr /user/oozie/share
  2. install the Oozie CDH 5 shared libraries. For example:
    $ sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz
    where FS_URI is the HDFS URI of the filesystem that the shared library should be installed on (for example, hdfs://<HOST>:<PORT>).

Step 4: Start the Oozie Server

Now you can start Oozie:
$ sudo service oozie start

Check Oozie's oozie.log to verify that Oozie has started successfully.

Step 5: Upgrade the Oozie Client

Although older Oozie clients work with the new Oozie server, you need to install the new version of the Oozie client in order to use all the functionality of the Oozie server.

To upgrade the Oozie client, if you have not already done so, follow the steps under Installing Oozie.

Configuring Oozie after a New Installation

When you install Oozie from an RPM or Debian package, Oozie server creates all configuration, documentation, and runtime files in the standard Linux directories, as follows.

Type of File Where Installed

binaries

/usr/lib/oozie/

configuration

/etc/oozie/conf/

documentation

for SLES: /usr/share/doc/packages/oozie/ for other platforms: /usr/share/doc/oozie/

examples TAR.GZ

for SLES: /usr/share/doc/packages/oozie/ for other platforms: /usr/share/doc/oozie/

sharelib TAR.GZ

/usr/lib/oozie/

data

/var/lib/oozie/

logs

/var/log/oozie/

temp

/var/tmp/oozie/

PID file

/var/run/oozie/

Deciding Which Database to Use

Oozie has a built-in Derby database, but Cloudera recommends that you use a PostgreSQL, MySQL, or Oracle database instead, for the following reasons:
  • Derby runs in embedded mode and it is not possible to monitor its health.
  • It is not clear how to implement a live backup strategy for the embedded Derby database, though it may be possible.
  • Under load, Cloudera has observed locks and rollbacks with the embedded Derby database which don't happen with server-based databases.
See Supported Databases for tested database versions.

Configuring Oozie to Use PostgreSQL

Install PostgreSQL 8.4.x or 9.0.x.

Create the Oozie user and Oozie database.

For example, using the PostgreSQL psql command-line tool:

$ psql -U postgres
Password for user postgres: *****

postgres=# CREATE ROLE oozie LOGIN ENCRYPTED PASSWORD 'oozie' 
 NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;
CREATE ROLE

postgres=# CREATE DATABASE "oozie" WITH OWNER = oozie
 ENCODING = 'UTF8'
 TABLESPACE = pg_default
 LC_COLLATE = 'en_US.UTF8'
 LC_CTYPE = 'en_US.UTF8'
 CONNECTION LIMIT = -1;
CREATE DATABASE

postgres=# \q

Configure PostgreSQL to accept network connections for the oozie user.

  1. Edit the postgresql.conf file and set the listen_addresses property to *, to make sure that the PostgreSQL server starts listening on all your network interfaces. Also make sure that the standard_conforming_strings property is set to off.
  2. Edit the PostgreSQL data/pg_hba.conf file as follows:
    host    oozie         oozie         0.0.0.0/0             md5

Reload the PostgreSQL configuration.

$ sudo -u postgres pg_ctl reload -s -D /opt/PostgreSQL/8.4/data

Configure Oozie to use PostgreSQL

Edit the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>org.postgresql.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:postgresql://localhost:5432/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...

Configuring Oozie to Use MySQL

Install and start MySQL 5.x

Create the Oozie database and Oozie MySQL user.

For example, using the MySQL mysql command-line tool:

$ mysql -u root -p
Enter password:

mysql> create database oozie default character set utf8;
Query OK, 1 row affected (0.00 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.00 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye

Configure Oozie to use MySQL.

Edit properties in the oozie-site.xml file as follows:

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://localhost:3306/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...

Add the MySQL JDBC Driver JAR to Oozie

Copy or symbolically link the MySQL JDBC driver JAR into one of the following directories:
  • For installations that use packages: /var/lib/oozie/
  • For installations that use parcels: /opt/cloudera/parcels/CDH/lib/oozie/lib/
directory.

Configuring Oozie to use Oracle

Install and start Oracle 11g.

Use Oracle's instructions.

Create the Oozie Oracle user and grant privileges.

The following example uses the Oracle sqlplus command-line tool, and shows the privileges Cloudera recommends.

$ sqlplus system@localhost

Enter password: ******

SQL> create user oozie identified by oozie default tablespace users temporary tablespace temp;

User created.

SQL> grant alter any index to oozie;
grant alter any table to oozie;
grant alter database link to oozie;
grant create any index to oozie;
grant create any sequence to oozie;
grant create database link to oozie;
grant create session to oozie;
grant create table to oozie;
grant drop any sequence to oozie;
grant select any dictionary to oozie;
grant drop any table to oozie;
grant create procedure to oozie;
grant create trigger to oozie;

SQL> exit

$

Configure Oozie to use Oracle.

Edit the oozie-site.xml file as follows.

...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>oracle.jdbc.OracleDriver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:oracle:thin:@//myhost:1521/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...

Add the Oracle JDBC driver JAR to Oozie.

Copy or symbolically link the Oracle JDBC driver JAR into the /var/lib/oozie/ directory.

Creating the Oozie Database Schema

After configuring Oozie database information and creating the corresponding database, create the Oozie database schema. Oozie provides a database tool for this purpose.

The Oozie database tool works in 2 modes: it can create the database, or it can produce an SQL script that a database administrator can run to create the database manually. If you use the tool to create the database schema, you must have the permissions needed to execute DDL operations.

To run the Oozie database tool against the database

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run

You should see output such as the following (the output of the script may differ slightly depending on the database vendor) :

Validate DB Connection.
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '4.0.0-cdh5.0.0'

The SQL commands have been written to: /tmp/ooziedb-5737263881793872034.sql

To create the upgrade script

Run /usr/lib/oozie/bin/ooziedb.sh create -sqlfile SCRIPT. For example:

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie-create.sql

You should see output such as the following (the output of the script may differ slightly depending on the database vendor) :

Validate DB Connection.
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '4.0.0-cdh5.0.0'

The SQL commands have been written to: oozie-create.sql

WARN: The SQL commands have NOT been executed, you must use the '-run' option

Enabling the Oozie Web Console

To enable the Oozie web console, download and add the ExtJS library to the Oozie server. If you have not already done this, proceed as follows.

Step 1: Download the Library

Download the ExtJS version 2.2 library from https://archive.cloudera.com/gplextras/misc/ext-2.2.zip and place it a convenient location.

Step 2: Install the Library

Extract the ext-2.2.zip file into /var/lib/oozie.

Configuring Oozie with Kerberos Security

To configure Oozie with Kerberos security, see Oozie Authentication.

Installing the Oozie Shared Library in Hadoop HDFS

The Oozie installation bundles the Oozie shared library, which contains all of the necessary JARs to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions.

The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:

  • The shared library file for MRv1 is oozie-sharelib-mr1.tar.gz.
  • The shared library file for YARN is oozie-sharelib-yarn.tar.gz.

To install the Oozie shared library in Hadoop HDFS in the oozie user home directory

$ sudo -u hdfs hadoop fs -mkdir /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
$ sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz

where FS_URI is the HDFS URI of the filesystem that the shared library should be installed on (for example, hdfs://<HOST>:<PORT>).

Configuring Support for Oozie Uber JARs

An uber JAR is a JAR that contains other JARs with dependencies in a lib/ folder inside the JAR. You can configure the cluster to handle uber JARs properly for the MapReduce action (as long as it does not include any streaming or pipes) by setting the following property in the oozie-site.xml file:

...
    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
        <value>true</value>
...

When this property is set, users can use the oozie.mapreduce.uber.jar configuration property in their MapReduce workflows to notify Oozie that the specified JAR file is an uber JAR.

Configuring Oozie to Run against a Federated Cluster

To run Oozie against a federated HDFS cluster using ViewFS, configure the oozie.service.HadoopAccessorService.supported.filesystems property in oozie-site.xml as follows:

<property>
     <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
     <value>hdfs,viewfs</value>
</property>