Migrating from source cluster to destination cluster

After registering the source and destination cluster, and labeling the scanned datasets and workloads on the source cluster, you can start the migration process.

Because migrating data to S3 can take a long time, you can perform multiple migrations between a source and destination cluster to move the data in stages. You can also choose to migrate only part of your data as opposed to all of it. A single CMA server is designed to handle multiple migrations.
  1. Click Migrations on the left navigation pane.
  2. Click Start Your First Migration.
  3. Select Cloudera Distributed Hadoop 5, Cloudera Distributed Hadoop 6 or CDP Private Cloud Base as Source Type.
    The registered source cluster is selected by default. You can select any other cluster using the drop-down menu . In case you have not registered a source cluster at this point, click New Source and complete the steps in Registering the source cluster.
  4. Click Next.
    CDP Public Cloud and the registered destination cluster are selected by default. You can select any other cluster using the drop-down menu. In case you have not registered a source cluster at this point, click New Target and complete the steps in Registering the destination cluster.
  5. Click Next.
  6. Click Next to confirm the migration path.
  7. Select one or more labels for migration migrate to the destination cluster.
    You can select if the migration should Run Now or be completed in a Scheduled Run. Run Now means that all of the datasets and workloads that were selected with the labels are going to be migrated as soon as the process starts. When choosing the Scheduled Run, you can select the start date of the migration, and set a frequency in which the migration process should proceed.
  8. Enable YARN migration if required, and provide the Knox Token to access Cloudera Manager of the Data Hub cluster in CDP Public Cloud. You also must set the S3 Bucket Base Path for HDFS or Cloud Storage Path when migrating HDFS data.
    The remaining settings on the Configurations page are automatically filled out, but can be changed based on your requirements.
  9. Click Next.
  10. Review the information on the Overview page and ensure that the information is correct.
    At this point, you can go back and change any configuration if the information is not correct.
  11. Click Create to save the migration plan.. You can follow the progress of creating the migration plan.
  12. Click Go to Migrations, and select the created CDH to CDP PC or CDP Private Cloud Base to CDP PC migration.
  13. Click Run First Step to start the migration.
    You can see the status and steps of the migration process.

    The Master Table shows a read-only version of the label and the related datasets, and the Configuration details the migration configurations.

    The Data & Metadata Migration executes the data migration of the labeled datasets with Replication Manager.

    You can also view the migration process of the data and workloads based on the selected services. For example, the Hive SQL Migration replicates the Hive SQL queries that were fixed to be Hive complied during the Hive Workload migraton steps.

    The Finalization waits until all the Replication Manager policies complete their jobs. If the label is created as a frequently scheduled migration, the Replication Manager waits only for the first jobs.

    When migrating from CDP Private Cloud Base to CDP Public Cloud, you need to manually export and import the Ranger policies from the source cluster to the destination cluster using the following curl commands:
    • Exporting policies
      • To export all policies:
        curl -X GET --header "text/json" -H "Content-Type: text/json" -o file.json -u [***USERNAME***]:[***PASSWORD***] "http://[***HOSTNAME***]:[***RANGER PORT***]/service/plugins/policies/exportJson"
      • To export for specific HDFS resource:
        curl -X GET --header "text/json" -H "Content-Type: text/json" -o file.json -u [***USERNAME***]:[***PASSWORD***] "http://[***HOSTNAME***]:[***RANGER PORT***]/service/plugins/policies/exportJson?resource%3Apath=[***PATH NAME***]"
      • To export for policies for specific resource such as Hive database and Hive column:
        curl -X GET --header "text/json" -H "Content-Type: text/json" -o file.json -u [***USERNAME***]:[***PASSWORD***] "http://[***HOSTNAME***]:[***RANGER PORT***]/service/plugins/policies/exportJson??resource%3Adatabase=[***DATABASE NAME***]&resource%3Acolumn=[***COLUMN NAME***]"
    • Importing policies
      • To Import policies from JSON file without servicesMap:
        curl -i -X POST -H "Content-Type: multipart/form-data" -F 'file=@/path/file.json' -u [***USERNAME***]:[***PASSWORD***] http://[***HOSTNAME***]:[***RANGER PORT***]/service/plugins/policies/importPoliciesFromFile?isOverride=true
      • To Import policies from JSON file with servicesMap:
        curl -i -X POST -H "Content-Type: multipart/form-data" -F 'file=@/path/file.json' -F ‘servicesMapJson=@/path/servicesMapping.json’ -u [***USERNAME***]:[***PASSWORD***] http://[***HOSTNAME***]:[***RANGER PORT***]/service/plugins/policies/importPoliciesFromFile?isOverride=true
The datasets and workloads selected are migrated from CDH or CDP Private Cloud Base to CDP Public Cloud.