Application Block Device or Mount Point
The master host on Cloudera Data Science Workbench requires at least 1 TB for database and project storage. This recommended capacity is contingent on the expected number of users and projects on the cluster.
While large data files should be stored on HDFS, it is not uncommon to find gigabytes of data or libraries in individual projects. Running out of storage will cause the application to fail. Cloudera recommends allocating at least 5 GB per project and at least 1 TB of storage in total. Make sure you continue to carefully monitor disk space usage and I/O using Cloudera Manager.
Cloudera Data Science Workbench stores all application data at
/var/lib/cdsw
. On a CSD-based deployment, this location is not
configurable. The Application Block Device should be formatted before installing
Cloudera Data Science Workbench. Cloudera Data Science Workbench will assume the system
administrator has formatted and mounted one or more block devices to
/var/lib/cdsw
on the master host. Note that Application Block
Device mounts are not required on worker hosts.
Regardless of the application data storage configuration
you choose, /var/lib/cdsw
must be stored on a
separate block device. Given typical database and user access
patterns, an SSD is strongly recommended.
The /var/lib/cdsw directory contains
persistent information such as database, configurations, image details and so on. By default,
data in /var/lib/cdsw
is not backed up or
replicated to HDFS or other hosts. Reliable storage and backup strategy is critical for
production installations. For more information, see Backup and Disaster Recovery for Cloudera Data Science
Workbench. To migrate the user-related information to a new host, you can transfer
the /var/lib/cdsw directory to the new
host.
if [ -b $FILE ]
If this expression returns
True
or 1
, then the file exists and is a block
special file. This means that the parameter is a path to the block device and cannot be
a plain UUID.