Migrating workflows directly created in Oozie to CDP One

The oozie workflows present on HDFS must be migrated to CDP One.

Oozie workflows created manually or outside of the Hue Workflow editor require specific review and manual updates. No automated tool or a process is required for updating manually created workflows and property files.

The Oozie workflows are available in HDFS within CDH and HDP clusters. To migrate the Oozie workflows from these clusters to CDP SaaS involves a separate workflow process.

While using the HDP clusters, you can use the DistCp tool to migrate the Oozie workflows present in HDFS. While using the CDH clusters, you can employ the Replication Manager App to migrate the Oozie workflows present in HDFS

Specifically, some manual updates are required to process the Oozie migration created outside of the Hue Workflow. Before you proceed further, you must understand:

  • During the migration process you must copy across all Oozie job files (workflow.xml, job.properties, and any supporting JARs).

  • Which Oozie workflow files must be copied or migrated to CDP One. Identify the workflow.xml file and job.properties file for each Oozie workload that must be migrated. These files are stored in HDFS and must be copied to the CDP One endpoint.

  • The job.properties file must be updated with the appropriate CDP One endpoints.

  • Optionally, the workflow.xml file needs to be updated. For example, while currently using the legacy “hive action” requires an update to the newer “hive2 action”.

Depending on where (the location) you have stored your Oozie workflow data in HDFS, note the following information:

The workflow.xml and any job JAR files reside within HDFS in the source cluster. These will have to be copied across into a CDP One endpoint.

The job.properties file for a job contains a reference to the location of where the workflow files are stored. The job.properties file will need to be updated during a migration with the new target environment settings / locations.

  • If required, the refactoring of the component level code. For example, Oozie job executing Spark 1.6 code requires an update to the newer Spark 2.4 version. For more information, see Migrating Spark workloads to CDP.