Migrating workflows directly created in Oozie to CDP
The oozie workflows present on HDFS must be migrated to CDP.
Oozie workflows created manually or outside of the Hue Workflow editor require specific review and manual updates. No automated tool or a process is required for updating manually created workflows and property files.
The Oozie workflows are available in HDFS within CDH and HDP clusters. To migrate the Oozie workflows from these clusters to CDP SaaS involves a separate workflow process.
While using the HDP clusters, you can use the DistCp tool to migrate the Oozie workflows present in HDFS. While using the CDH clusters, you can employ the Replication Manager App to migrate the Oozie workflows present in HDFS
Specifically, some manual updates are required to process the Oozie migration created outside of the Hue Workflow. Before you proceed further, you must understand:
During the migration process you must copy across all Oozie job files (
job.properties, and any supporting JARs).
Which Oozie workflow files must be copied or migrated to CDP One. Identify the
job.propertiesfile for each Oozie workload that must be migrated. These files are stored in HDFS and must be copied to the CDP One endpoint.
job.propertiesfile must be updated with the appropriate CDP One endpoints.
workflow.xmlfile needs to be updated. For example, while currently using the legacy “hive action” requires an update to the newer “
Depending on where (the location) you have stored your Oozie workflow data in HDFS, note the following information:
workflow.xml and any job JAR files reside within HDFS in
the source cluster. These will have to be copied across into a CDP One endpoint.
job.properties file for a job contains a reference to the
location of where the workflow files are stored. The
file will need to be updated during a migration with the new target environment settings
- If required, the refactoring of the component level code. For example, Oozie job executing Spark 1.6 code requires an update to the newer Spark 2.4 version. For more information, see Migrating Spark workloads to CDP.