Using an External NFS Server

You can install an NFS server that is external to the cluster.

Currently, NFS version 4.1 is the recommended protocol to use for CML. The NFS client within CML must be able to mount the NFS storage with default options, and also assumes these export options:

Before creating a CML workspace, the storage administrator must create a directory that will be exported to the cluster for storing ML project files for that workspace. Either a dedicated NFS export path, or a subdirectory in an existing export must be specified for each workspace.

Each CML workspace needs a unique directory that does not have files in it from a different or previous workspace. For example, if 10 CML workspaces are expected, the storage administrator will need to create 10 unique directories. Either one NFS export and 10 subdirectories within it need to be created, or 10 unique exports need to be created.

For example, to use a dedicated NFS share for a workspace named “workspace1” from NFS server “nfs_server”, do the following:

  1. Create NFS export directory “/workspace1”.

  2. Change ownership for the exported directory
    1. CML accesses this directory as a user with a UID and GID of 8536. Therefore, run chown 8536:8536 /workspace1
    2. Make the export directory group-writeable and set the GID: 
chmod g+srwx /workspace1
  3. Provide the NFS export path nfs_server:/workspace1 when prompted by the CML Control Plane App while creating the workspace.
  4. To use a subdirectory in an existing NFS share, say nfs_server:/export, do the following:
    1. Create a subdirectory /export/workspace1
    2. Change ownership: chown 8536:8536 /export/workspace1
    3. Set GID and make directory group writeable: chmod g+srwx /export/workspace1
    4. Provide the export path nfs_server:/export/workspace1 when prompted by the CML Control Plane App.

NFS share sizing

CML workloads are sensitive to latency and IO/s instead of throughput.

The minimum recommended file share size is 100 GB. The file share must support online volume capacity expansion. It must provide at least the following performance characteristics:

IO / s 3100
Throughput rate 110.0 MiBytes/s