Migrating workflows directly created in Oozie to Cloudera

The oozie workflows present on HDFS must be migrated to Cloudera.

Oozie workflows created manually or outside of the Hue Workflow editor require specific review and manual updates. No automated tool or a process is required for updating manually created workflows and property files.

The Oozie workflows are available in HDFS within CDH and HDP clusters. To migrate the Oozie workflows from these clusters to Cloudera SaaS involves a separate workflow process.

While using the HDP clusters, you can use the DistCp tool to migrate the Oozie workflows present in HDFS. While using the CDH clusters, you can employ the Replication Manager App to migrate the Oozie workflows present in HDFS

Specifically, some manual updates are required to process the Oozie migration created outside of the Hue Workflow. Before you proceed further, you must understand:

  • During the migration process you must copy across all Oozie job files (workflow.xml, job.properties, and any supporting JARs).

  • Which Oozie workflow files must be copied or migrated to your cluster. Identify the workflow.xml file and job.properties file for each Oozie workload that must be migrated. These files are stored in HDFS and must be copied to your cluster.

  • The job.properties file must be updated with the appropriate cluster endpoints.

  • Optionally, the workflow.xml file needs to be updated. For example, while currently using the legacy “hive action” requires an update to the newer “hive2 action”.

Depending on where (the location) you have stored your Oozie workflow data in HDFS, note the following information:

The workflow.xml and any job JAR files reside within HDFS in the source cluster. These will have to be copied across into the cluster you are migrating to.

The job.properties file for a job contains a reference to the location of where the workflow files are stored. The job.properties file will need to be updated during a migration with the new target environment settings / locations.

  • If required, the refactoring of the component level code. For example, Oozie job executing Spark 1.6 code requires an update to the newer Spark 2.4 version. For more information, see Migrating Spark workloads to Cloudera.