Application Block Device or Mount Point

The master host on Cloudera Data Science Workbench requires at least 1 TB for database and project storage. This recommended capacity is contingent on the expected number of users and projects on the cluster.

While large data files should be stored on HDFS, it is not uncommon to find gigabytes of data or libraries in individual projects. Running out of storage will cause the application to fail. Cloudera recommends allocating at least 5 GB per project and at least 1 TB of storage in total. Make sure you continue to carefully monitor disk space usage and I/O using Cloudera Manager.

Cloudera Data Science Workbench stores all application data at /var/lib/cdsw. On a CSD-based deployment, this location is not configurable. The Application Block Device should be formatted before installing Cloudera Data Science Workbench. Cloudera Data Science Workbench will assume the system administrator has formatted and mounted one or more block devices to /var/lib/cdsw on the master host. Note that Application Block Device mounts are not required on worker hosts.

Regardless of the application data storage configuration you choose, /var/lib/cdsw must be stored on a separate block device. Given typical database and user access patterns, an SSD is strongly recommended.

The /var/lib/cdsw directory contains persistent information such as database, configurations, image details and so on. By default, data in /var/lib/cdsw is not backed up or replicated to HDFS or other hosts. Reliable storage and backup strategy is critical for production installations. For more information, see Backup and Disaster Recovery for Cloudera Data Science Workbench. To migrate the user-related information to a new host, you can transfer the /var/lib/cdsw directory to the new host.

UUID cannot be used in place of the disk name to identify a block device in CDSW. You can determine whether a given path is a block device or not by using the following expression:
if [ -b $FILE ]
If this expression returns True or 1, then the file exists and is a block special file. This means that the parameter is a path to the block device and cannot be a plain UUID.