Creating a Backup for Disaster Recovery for Cloudera Data Science Workbench

Cloudera strongly recommends both regular backups and backups before upgrades and is not responsible for any data loss.

  1. Stop Cloudera Data Science Workbench.
    Use one of the following steps depending on your Cloudera Data Science workbench version.
    Cloudera Data Science Workbench 1.4.2 or lower

    Do not stop or restart Cloudera Data Science Workbench without using the cdsw_protect_stop_restart.sh script. This is to help avoid the data loss issue detailed in TSB-346 .

    Run the script on your master host and stop Cloudera Data Science Workbench (instructions below) only when instructed to do so by the script. Then proceed with step 2 of this process.

    Cloudera Data Science Workbench 1.4.3 or higher

    Depending on your deployment, use one of the following sets of instructions to stop the application.

    • CSD - Log in to Cloudera Manager. On the Home > Status tab, click to the right of the CDSW service and select Stop from the dropdown. Wait for the action to complete.

      OR

    • RPM - Run the following command on the master host:
      cdsw stop
  2. After stopping CDSW, and before running the following tar command, wait 2-5 minutes (depending on your disk speed) to ensure that all data from CDSW is successfully written to the disks. Otherwise the tar command may not capture all recent changes.
  3. To create the backup, run the following command on the master host:
    tar cvzf cdsw.tar.gz /var/lib/cdsw/*
  4. (Optional) If needed, the following command can be used to unpack the tar bundle.
    tar xvzf cdsw.tar.gz -C /var/lib/cdsw