Creating a Backup for Disaster Recovery for Cloudera Data Science Workbench
Cloudera strongly recommends both regular backups and backups before upgrades and is not responsible for any data loss.
-
Stop Cloudera Data Science Workbench.
Use one of the following steps depending on your Cloudera Data Science workbench version.
- Cloudera Data Science Workbench 1.4.2 or lower
-
Do not stop or restart Cloudera Data Science Workbench without using the cdsw_protect_stop_restart.sh script. This is to help avoid the data loss issue detailed in TSB-346 .
Run the script on your master host and stop Cloudera Data Science Workbench (instructions below) only when instructed to do so by the script. Then proceed with step 2 of this process.
- Cloudera Data Science Workbench 1.4.3 or higher
-
Depending on your deployment, use one of the following sets of instructions to stop the application.
- CSD - Log in to Cloudera Manager. On the Home > Status tab, click to the right of the
CDSW service and select Stop from the dropdown. Wait for the
action to complete.
OR
- RPM - Run the following command on the master host:
cdsw stop
- CSD - Log in to Cloudera Manager. On the Home > Status tab, click to the right of the
CDSW service and select Stop from the dropdown. Wait for the
action to complete.
- After stopping CDSW, and before running the following tar command, wait 2-5 minutes (depending on your disk speed) to ensure that all data from CDSW is successfully written to the disks. Otherwise the tar command may not capture all recent changes.
-
To create the backup, run the following command on the master host:
tar cvzf cdsw.tar.gz /var/lib/cdsw/*
-
(Optional) If needed, the following command can be used to unpack the tar
bundle.
tar xvzf cdsw.tar.gz -C /var/lib/cdsw