Creating a Backup for Disaster Recovery for Cloudera Data Science Workbench
Cloudera strongly recommends both regular backups and backups before upgrades and is not responsible for any data loss.
- 
        Stop Cloudera Data Science Workbench.
        Use one of the following steps depending on your Cloudera Data Science workbench version.
- Cloudera Data Science Workbench 1.4.2 or lower
 - 
                
Do not stop or restart Cloudera Data Science Workbench without using the cdsw_protect_stop_restart.sh script. This is to help avoid the data loss issue detailed in TSB-346 .
Run the script on your master host and stop Cloudera Data Science Workbench (instructions below) only when instructed to do so by the script. Then proceed with step 2 of this process.
 - Cloudera Data Science Workbench 1.4.3 or higher
 - 
                
Depending on your deployment, use one of the following sets of instructions to stop the application.
- CSD - Log in to Cloudera Manager. On the Home > Status tab, click 
 to the right of the
                      CDSW service and select Stop from the dropdown. Wait for the
                    action to complete. OR
 - RPM - Run the following command on the master host:
                    
cdsw stop 
 - CSD - Log in to Cloudera Manager. On the Home > Status tab, click 
 
 - After stopping CDSW, and before running the following tar command, wait 2-5 minutes (depending on your disk speed) to ensure that all data from CDSW is successfully written to the disks. Otherwise the tar command may not capture all recent changes.
 - 
        To create the backup, run the following command on the master host:
        
tar cvzf cdsw.tar.gz /var/lib/cdsw/* - 
        (Optional) If needed, the following command can be used to unpack the tar
          bundle.
        
tar xvzf cdsw.tar.gz -C /var/lib/cdsw 
