Migrating workflows directly created in Oozie to Cloudera
The oozie workflows present on HDFS must be migrated to Cloudera.
Oozie workflows created manually or outside of the Hue Workflow editor require specific review and manual updates. No automated tool or a process is required for updating manually created workflows and property files.
The Oozie workflows are available in HDFS within CDH and HDP clusters. To migrate the Oozie workflows from these clusters to Cloudera SaaS involves a separate workflow process.
While using the HDP clusters, you can use the DistCp tool to migrate the Oozie workflows present in HDFS. While using the CDH clusters, you can employ the Replication Manager App to migrate the Oozie workflows present in HDFS
Specifically, some manual updates are required to process the Oozie migration created outside of the Hue Workflow. Before you proceed further, you must understand:
-
During the migration process you must copy across all Oozie job files (
workflow.xml
,job.properties
, and any supporting JARs). -
Which Oozie workflow files must be copied or migrated to your cluster. Identify the
workflow.xml
file andjob.properties
file for each Oozie workload that must be migrated. These files are stored in HDFS and must be copied to your cluster. -
The
job.properties
file must be updated with the appropriate cluster endpoints. -
Optionally, the
workflow.xml
file needs to be updated. For example, while currently using the legacy “hive action” requires an update to the newer “hive2 action
”.
Depending on where (the location) you have stored your Oozie workflow data in HDFS, note the following information:
The workflow.xml
and any job JAR files reside within HDFS in
the source cluster. These will have to be copied across into the cluster you are
migrating to.
The job.properties
file for a job contains a reference to the
location of where the workflow files are stored. The job.properties
file will need to be updated during a migration with the new target environment settings
/ locations.
- If required, the refactoring of the component level code. For example, Oozie job executing Spark 1.6 code requires an update to the newer Spark 2.4 version. For more information, see Migrating Spark workloads to Cloudera.