Migrating Oozie workflows

During the Oozie workflow migration, the job definitions, job properties and other Oozie job related data are migrated from a CDH cluster to a Data Hub cluster.

Before the migration, the source cluster is scanned to collect the workflows, coordinators, bundles and discover the relations between them. You also have the option to parse the Hive SQL files to obtain the related databases and tables names. During the migration process, the Oozie jobs are not affected on the source cluster and can remain in running state. When the migration is finished, the job definitions are stored in the S3 bucket and the job properties are stored in the local filesystem.
  • Ensure that CMA is set up correctly using the steps in Setting up CMA server.
  • Ensure that you have met the requirements detailed in Reviewing prerequisites before migration.
  • Ensure that you have a CDH 5 or CDH 6 cluster registered as a source from which you want to migrate your Oozie workflows. If you do not have a source cluster yet, complete the steps in Registering source clusters.
  • Ensure that you have a Data Hub cluster registered as a target cluster to which you want to migrate your Oozie workflows. If you do not have a target cluster yet, complete the steps in Registering target clusters.
  1. Click on the CDH cluster you want to use for the migration on the Clusters page.
  2. Click Start Scanning to open the Scan Settings.
  3. Select Oozie workflow scan.
    1. Provide the Number of latest days to scan to define the period from which the Oozie jobs are collected.
    2. Click Scan selected.
      You will be redirected to the scanning progress, where you can monitor if the scanning process was successful or encountered any error.
  4. Click on Oozie Job Definitions to view the collected job definitions when the scan is finished.
    You have the option to analyze the Hive scripts when you migrate Oozie jobs that depend on Hive SQL files. In this case, CMA scans and identifies the SQL file location stored either in HDFS or other custom directories, and adds the SQL files to the migration plan.
    1. Enable Run Hive3Parser.
    2. Select the Oozie jobs to analyze.
    3. Click .
    After the scan is completed, the Hive scripts related to the selected Oozie jobs are listed under Hive SQL tab.
  5. Add the Oozie job definitions to Collections.
    Collections serve as an organization method to sort and bundle the job definitions into groups for the migration. You can create more collections beside the Default collection based on your requirements. The Hive scripts that belong to the Oozie job definitions are automatically added to the same collection.

    After you are finished with sorting the job definitions to collections, you can start the migration process by creating the migration plan.

  6. Click Create Migration or select Migrations > Start Your First Migration.
    1. Select the source cluster, and click Next.
    2. Select the target cluster, and click Next.
    3. Select the type of migration, and click Next.
    4. Select the collections that you want to migrate, and click Next.
      You can select if the migration should Run Now or be completed in a Scheduled Run. Run Now means that the Oozie job definitions in the selected collections are going to be migrated as soon as the process starts. When choosing the Scheduled Run, you can select the start date of the migration, and set a frequency in which the migration process should proceed.
    5. Provide the Knox token to access Cloudera Manager of the Data Hub cluster in CDP Public Cloud.
      1. Navigate to the target Data Hub cluster.
      2. Select Knox Token from the list of services.
      3. Click Token generation, and provide the name and life of the token.
      4. Click Generate Token.
      5. Copy the generated token, and navigate back to the migration plan. Paste the token to the Knox Token field.
    6. Enable Oozie service configuring to prepare Oozie service on target cluster for running jobs to include a service preparation step during the migration process.
      You can set the paths used by Oozie services. These paths are used when configuring the Oozie service for migration.
    7. Click Next.
      An overview of the migration plan is displayed. At this point, you can go back and change any configuration if the information is not correct. If the information is correct, click Create.
  7. Click Go to Migrations when the migration plan is successfully created.
  8. Click on the CDH to CDP PC migration to start the migration.
    The steps are displayed that are going to be completed during the migration.
  9. Review and configure the Oozie job definitions under Configuration before starting the migration process.
    1. Select a job definition to list the corresponding Job properties and Workflow.
      The original and proposed values are filled out based on the source and target cluster information.
    2. Modify the values of the job definition based on the warnings highlighted in the Workflow diff view. You can save the job definition changes using the Save button.
      CMA typically looks for configuration values that are related to service endpoints, Kerberos principals, and so on. These configuration values are used to update the file locations and other configurations accordingly. While the automatic changes work without any reservation, ensure to review the propositions and update the configurations based on the target cluster requirements. The following properties and values should be reviewed before the migration:
      • HDFS file paths changed to S3
      • Hostnames
      • Service settings
      • Paths to user-related directories
    3. Click Save property changes to update the configurations.
      You have the option to save the changes for only the edited jobs or apply the changes to all of the jobs.
  10. Click to start migration.
    During the Hive SQL migration, the Hive scripts are copied to the Hive S3 bucket on the target. When the Hive SQL Migration is finished, click to start preparing the Oozie service on the target cluster for running the jobs that are stored in S3. When the service preparation is finished, click to start uploading the job definitions and configurations to the local file system and S3 bucket.
When all of the steps are successfully completed, the migration of Oozie job definitions from CDH to CDP Public Cloud is finished. You can restart the Oozie jobs on the target Data Hub cluster using Command Line Interface (CLI) or Hue.