Cloudera Data Science Workbench to Cloudera AI migration strategy
Two approaches exist for migrating Cloudera Data Science Workbench (CDSW) to Cloudera AI on premises. The details on these approaches help you determine the more suitable migration tool for your needs.
- CDSW migration tool
- Project migration command-line utility tool
CDSW migration tool
The CDSW migration tool, built in the Cloudera AI on premises, completes a cluster-level replication of the entire CDSW cluster to a new Cloudera AI Workbench. When the migration is complete, users of the CDSW cluster have a cutover to the Cloudera AI Workbench at once.
- The user starts the migration in the Cloudera AI User interface.
- When the migration is triggered for the first time, the replication copies the project files from the source CDSW to the target Cloudera AI Workbench in the background. You can use the source CDSW cluster during the replication.
- Check Migration Readiness to verify that CDSW and Cloudera AI are prepared for the migration.
-
Once the initial migration is complete, the target Cloudera AI Workbench transitions to the
Validation Startedstate, indicating that it is now available. At this stage, you can begin validating the projects within Cloudera AI. - The migration can be triggered multiple times to replicate the changes from the source CDSW to the target Cloudera AI Workbench. Note that any changes done in the target Cloudera AI Workbench will be overwritten.
- Once you are ready for the cutover, a final migration can be triggered which stops the source CDSW cluster for the final replication. After this point, the source CDSW cluster will be inaccessible.
Project migration command-line utility tool
The Cloudera AI command-line utility tool is a Cloudera open-source tool that handles migration at project level. Although this approach has lower risks than the CDSW migration tool, it requires more effort in setting up the target Cloudera AI Workbench and the migration process is likely to take longer.
The Cloudera AI command-line utility tool is built using Python and
it requires Python 3.10. This tool must be installed on a staging host, referred to as
bastion host. This host requires considerations on both the source CDSW
and the target Cloudera AI cluster.
The concept of the Cloudera AI command-line utility tool is based on exporting projects from the source CDSW and importing them to the target Cloudera AI Workbench.
The tool is designed to be used by the Data Scientists who can migrate their own projects. However, Administrators can also use the tool to perform batch migration.
This tool is not limited to CDSW migration. It can also handle migration from Cloudera Machine Learning to Cloudera AI or from Cloudera AI on premises to Cloudera AI on cloud.
