During the Oozie workflow migration, the job definitions, job properties and other
Oozie job related data are migrated from a CDH or CDP Private Cloud Base cluster to a Data
Hub cluster.
Before the migration, the source cluster is scanned to
collect the workflows, coordinators, bundles and discover the relations between them.
You also have the option to parse the Hive SQL files to obtain the related databases and
tables names. During the migration process, the Oozie jobs are not affected on the
source cluster and can remain in running state. When the migration is finished, the job
definitions are stored in the S3 bucket and the job properties are stored in the local
filesystem.
- Ensure that CMA is set up correctly using the steps in Setting up CMA server.
- Ensure that you have met the requirements detailed in Reviewing prerequisites before
migration.
- Ensure that you have a CDH 5, CDH 6 or CDP Private Cloud Base cluster registered
as a source from which you want to migrate your Oozie workflows. If you do not
have a source cluster yet, complete the steps in Registering source clusters.
- Ensure that you have a Data Hub cluster registered as a destination cluster to
which you want to migrate your Oozie workflows. If you do not have a destination
cluster yet, complete the steps in Registering destination clusters.
-
Click on the CDH or CDP Private Cloud Base cluster you want to use for the
migration on the Clusters page.
-
Click Start Scanning to open the Scan
Settings.
-
Select Oozie workflow scan.
-
Provide the Number of latest days to scan to
define the period from which the Oozie jobs are collected.
-
Click Scan selected.
You will be redirected to the scanning progress, where you can monitor
if the scanning process was successful or encountered any error.
-
Click on Oozie Job Definitions to view the collected job
definitions when the scan is finished.
You have the option to analyze the Hive scripts when you migrate Oozie jobs
that depend on Hive SQL files. In this case, CMA scans and identifies the SQL
file location stored either in HDFS or other custom directories, and adds the
SQL files to the migration plan.
- Enable Run Hive3Parser.
- Select the Oozie jobs to analyze.
- Click
.
After the scan is completed, the Hive scripts related to the selected Oozie
jobs are listed under
Hive SQL tab.
-
Add the Oozie job definitions to Collections.
Collections serve as an organization method to sort and bundle the job
definitions into groups for the migration. You can create more collections
beside the
Default collection based on your requirements.
The Hive scripts that belong to the Oozie job definitions are automatically
added to the same collection.
After you are finished with sorting the job
definitions to collections, you can start the migration process by creating
the migration plan.
-
Click Create Migration or select .
-
Select the source cluster, and click Next.
-
Select the destination cluster, and click
Next.
-
Select the type of migration, and click
Next.
-
Select the collections that you want to migrate, and click
Next.
You can select if the migration should Run Now
or be completed in a Scheduled Run.
Run Now means that the Oozie job definitions
in the selected collections are going to be migrated as soon as the
process starts. When choosing the Scheduled Run,
you can select the start date of the migration, and set a frequency in
which the migration process should proceed.
-
Provide the Knox token to access Cloudera
Manager of the Data Hub cluster in CDP Public Cloud.
- Navigate to the destination Data Hub cluster.
- Select Knox Token from the list of
services.
- Click Token generation, and provide the
name and life of the token.
- Click Generate Token.
- Copy the generated token, and navigate back to the migration
plan. Paste the token to the Knox Token
field.
-
Enable Oozie service configuring to prepare
Oozie service on destination cluster for running jobs to include a
service preparation step during the migration process.
You can set the paths used by Oozie services. These paths are used
when configuring the Oozie service for migration.
-
Click Next.
An overview of the migration plan is displayed. At this point, you can
go back and change any configuration if the information is not correct.
If the information is correct, click
Create.
-
Click Go to Migrations when the migration plan is
successfully created.
-
Click on the CDH to CDP PC or CDP Private
Cloud Base to CDP PC migration to start the migration.
The steps are displayed that are going to be completed during the migration.
-
Review and configure the Oozie job definitions under
Configuration before starting the migration
process.
-
Select a job definition to list the corresponding Job
properties and Workflow.
The original and proposed values are filled out based on the source
and destination cluster information.
-
Modify the values of the job definition based on the warnings
highlighted in the Workflow diff view. You can save
the job definition changes using the Save
button.
CMA typically looks for configuration values that are related to
service endpoints, Kerberos principals, and so on. These configuration
values are used to update the file locations and other configurations
accordingly. While the automatic changes work without any reservation,
ensure to review the propositions and update the configurations based on
the destination cluster requirements. The following properties and
values should be reviewed before the migration:
- HDFS file paths changed to S3 or ABFS
- Hostnames
- Service settings
- Paths to user-related directories
-
Click Save property changes to update the
configurations.
You have the option to save the changes for only the edited jobs or
apply the changes to all of the jobs.
-
Click
to start
migration.
During the Hive SQL migration, the Hive scripts are copied to the Hive S3
bucket on the destination cluster. When the Hive SQL Migration is finished,
click
![](../images/ic-play.png)
to start
preparing the Oozie service on the destination cluster for running the jobs that
are stored in S3. When the service preparation is finished, click
![](../images/ic-play.png)
to start uploading the
job definitions and configurations to the local file system and S3
bucket.
When all of the steps are successfully completed, the migration of Oozie job
definitions from CDH or CDP Private Cloud Base to CDP Public Cloud is finished. You can
restart the Oozie jobs on the destination Data Hub cluster using Command Line Interface
(CLI) or Hue.