CML service with Data Lake upgrades

CDP environments have two services which can be upgraded individually, FreeIPA service and Data Lake service. CML workspaces run in CDP environments. FreeIPA service provides identity management and Data Lake service provides SDX capabilities to CML workspaces.

In this document we provide FAQs for the behavior of CML workspaces during a Data Lake upgrade. This document does not cover FreeIPA upgrades.

What kinds of DL upgrades are possible?

Data Lake service supports the following upgrades.

  • Hotfix upgrades
  • Version upgrades
  • OS version upgrades

All these upgrades can be done from CDP Data Lake service UI with a click of a button or using CDP CLI. DL upgrades are downtime upgrades. Data Lake upgrades take care of state preservation of DL before and after upgrades.

During DL upgrades, the shape of the data lake cannot change, for example, we cannot change LIGHT_DUTY shape to MEDIUM_DUTY_HA shape during DL upgrade.

What is DL migration and is it supported?

Changing DL shape is called DL migration. For example, updating DL LIGHT_DUTY → MEDIUM_DUTY_HA form is called DL migration.

DL migration is not supported yet in CDP.

What happens if DL upgrade/migration fails for some reason?

There is no automated backup and restore process for DL. It is recommended to take a backup of DL before upgrade starts. If DL upgrade fails, the recommended option is to delete failed DL only using cdp cli (“cdp datalake delete-datalake --datalake-name <dl name>”) and recreate DL using cdpcli. Once DL is recreated, you need to restore DL state from backup. This is a manual process and please refer to DL documentation regarding manual backup and restore process.

At any point, we should not delete environment service during the failed DL upgrade process. Deleting an environment means none of the CML workspace running in this environment are usable anymore.

Can Environment service be deleted and recreated at any point if something goes bad during DL upgrades/migrations?

No. Environments with experiences running inside them, cannot be deleted at anypoint. If you delete the environment, then all the experience (like CML workspace) needs to be deleted.

For CML workspaces, there is no automated backup and restore option. Customers can choose to manually backup/restore workspaces based on a runbook available here. Unless a CML workspaces is manually backed up/restored according to the runbook, if a CML workspace is deleted, then all the state is lost and you need to start from a fresh workspace. So it is not recommended to delete an environment with CML workspace. Instead it is recommended to upgrade/migrate DLs within the same environment by recreating it using the DL API ( with CDP CLI as suggested above)

What are the CML workspaces prerequisites for DL upgrades/migrations?

Here are the steps that needs to be taken before upgrading/migrating DL.
  • Upgrade CML workspaces to the latest version (if upgrade is available).
  • Stop any jobs, sessions, experiments or any workloads that need DL access in CML workspace before performing DL upgrades.
  • Announce to the team that there will be a planned downtime for CML workspaces during DL upgrade process.

Are the CML workspaces operational during DL upgrades/migrations?

It is recommended to NOT use CML workspace during DL upgrades.

However, below is the observed behavior of CML workspaces during DL upgrades.

  • CML workspace will remain accessible during DL upgrades. Users can login to CML workspace.
  • Users can launch sessions, run jobs, experiments, models, etc.. which do not require DL access. For example, jobs that do not require IDBroker or SDX/HMS access will function normally.
  • Any compute that requires IDBroker or SDX/HMS access will fail.
  • Any scheduled jobs that require IDBroker or SDX access will fail.

Do we need to do any changes to CML workspaces after DL is upgraded/migrated successfully?

Nope. There is no action required on CML workspace after a successful DL upgrade. CML workspace will function normally as before after DL upgrade. Just announce that team that they can start using CML workspaces as usual.