During the Oozie workflow migration, the job definitions, job properties and other
Oozie job related data are migrated from a CDH or Cloudera Private Cloud Base cluster to a Cloudera Data Hub cluster.
Before the migration, the source cluster is scanned to
collect the workflows, coordinators, bundles and discover the relations between them.
You also have the option to parse the Hive SQL files to obtain the related databases and
tables names. During the migration process, the Oozie jobs are not affected on the
source cluster and can remain in running state. When the migration is finished, the job
definitions are stored in the S3 bucket and the job properties are stored in the local
filesystem.
-
Click on the CDH or Cloudera Private Cloud Base
cluster you want to use for the migration on the Clusters
page.
-
Click Start Scanning to open the Scan
Settings.
-
Select Oozie workflow scan.
-
Provide the Number of latest days to scan to
define the period from which the Oozie jobs are collected.
-
Click Scan selected.
You will be redirected to the scanning progress, where you can monitor
if the scanning process was successful or encountered any error.
-
Click on Oozie Job Definitions to view the collected job
definitions when the scan is finished.
You have the option to analyze the Hive scripts when you migrate Oozie jobs
that depend on Hive SQL files. In this case,
Cloudera Migration Assistant
scans and identifies the SQL file location stored either in HDFS or other custom
directories, and adds the SQL files to the migration plan.
- Enable Run Hive3Parser.
- Select the Oozie jobs to analyze.
- Click .
After the scan is completed, the Hive scripts related to the selected Oozie
jobs are listed under
Hive SQL tab.
-
Add the Oozie job definitions to Collections.
Collections serve as an organization method to sort and bundle the job
definitions into groups for the migration. You can create more collections
beside the
Default collection based on your requirements.
The Hive scripts that belong to the Oozie job definitions are automatically
added to the same collection.
After you are finished with sorting the job
definitions to collections, you can start the migration process by creating
the migration plan.
-
Click Create Migration or select .
-
Select the source cluster, and click Next.
-
Select the destination cluster, and click
Next.
-
Select the type of migration, and click
Next.
-
Select the collections that you want to migrate, and click
Next.
You can select if the migration should Run Now
or be completed in a Scheduled Run.
Run Now means that the Oozie job definitions
in the selected collections are going to be migrated as soon as the
process starts. When choosing the Scheduled Run,
you can select the start date of the migration, and set a frequency in
which the migration process should proceed.
-
Provide the Knox token to access Cloudera Manager of the Cloudera Data Hub cluster in Cloudera
Public Cloud.
- Navigate to the destination Cloudera Data Hub
cluster.
- Select Knox Token from the list of
services.
- Click Token generation, and provide the
name and life of the token.
- Click Generate Token.
- Copy the generated token, and navigate back to the migration
plan. Paste the token to the Knox Token
field.
-
Enable Oozie service configuring to prepare
Oozie service on destination cluster for running jobs to include a
service preparation step during the migration process.
You can set the paths used by Oozie services. These paths are used
when configuring the Oozie service for migration.
-
Click Next.
An overview of the migration plan is displayed. At this point, you can
go back and change any configuration if the information is not correct.
If the information is correct, click
Create.
-
Click Go to Migrations when the migration plan is
successfully created.
-
Click on the CDH to Cloudera Public Cloud or
Cloudera Private Cloud Base to
Cloudera Public Cloud
migration to start the migration.
The steps are displayed that are going to be completed during the migration.
-
Review and configure the Oozie job definitions under
Configuration before starting the migration
process.
-
Select a job definition to list the corresponding Job
properties and Workflow.
The original and proposed values are filled out based on the source
and destination cluster information.
-
Modify the values of the job definition based on the warnings
highlighted in the Workflow diff view. You can save
the job definition changes using the Save
button.
Cloudera Migration Assistant typically looks for configuration
values that are related to service endpoints, Kerberos principals, and
so on. These configuration values are used to update the file locations
and other configurations accordingly. While the automatic changes work
without any reservation, ensure to review the propositions and update
the configurations based on the destination cluster requirements. The
following properties and values should be reviewed before the
migration:
- HDFS file paths changed to S3 or ABFS
- Hostnames
- Service settings
- Paths to user-related directories
-
Click Save property changes to update the
configurations.
You have the option to save the changes for only the edited jobs or
apply the changes to all of the jobs.
-
Click to start
migration.
During the Hive SQL migration, the Hive scripts are copied to the Hive S3
bucket on the destination cluster. When the Hive SQL Migration is finished,
click
to start
preparing the Oozie service on the destination cluster for running the jobs that
are stored in S3. When the service preparation is finished, click
to start uploading the
job definitions and configurations to the local file system and S3
bucket.
When all of the steps are successfully completed, the migration of Oozie job
definitions from CDH or Cloudera Private Cloud Base to Cloudera
Public Cloud is finished. You can restart the Oozie jobs on the
destination Cloudera Data Hub cluster using Command Line Interface
(CLI) or Hue.