Cloudera Machine Learning hardware requirements

To launch the Cloudera Machine Learning service, the CDP Private Cloud Experiences host must meet several requirements. Review the following CML-specific NFS server and storage requirements.

Hardware requirements

Storage

The OpenShift cluster must have persistent storage classes defined for both block and filesystem volumeModes of storage. Ensure that a block storage class is set up. The exact amount of storage classified as block or filesystem storage depends on the specific workloads (Machine Learning or Data Warehouse) and how they are used:
  • Data Warehousing workload for instance, requires 128 GB of memory and 600 GB of locally attached SSD storage, with a 100 GB of persistent volume storage on filesystem mounts, per executor. Depending on the number of executors the user wants to run per physical node, the per-node requirements change proportionally (For example, 3 executors per node requires 384 GB of memory and 1.8 TB of locally attached storage).
  • Machine Learning workload requirements for storage largely depend on the nature of your machine learning jobs. 4 TB of persistent volume block storage is required per Machine Learning Workspace instance for storing different kinds of metadata related to workspace configuration. Additionally, Machine Learning requires access to NFS storage routable from all pods running in the OpenShift cluster (see below).
  • Monitoring uses a large Prometheus instance to scrape workloads. Disk usage depends on scale of workloads. Recommended volume size is 60 GB.
Local Storage (for example, ext4) Block PV (for example, Ceph or Portworx) NFS (for ML user project files)
Control Plane N/A 250 GB N/A
CDW 600 GB per executor 100 GB per executor N/A
CML N/A 4 TB per workspace 1 TB per workspace (dependent on size of ML user files)

NFS

Cloudera Machine Learning (CML) requires NFS for storing project files and folders. An internal user-space NFS server can be deployed into the cluster which serves a block storage device (persistent volume) managed by the cluster’s software defined storage (SDS) system, such as Ceph, Portworx, and so on. This is the recommended option for CML in private cloud. Alternatively, the NFS server can be external to the cluster, For example., a NetApp filer which is accessible from the private cloud cluster nodes. NFS storage is to be used only for storing project files and folders, and not for any other CML data, such as PostgreSQL database and LiveLog.

CML today does not support shared volumes, such as Portworx shared volumes, for storing project files. A read-write-once (RWO) persistent volume must be allocated to the internal NFS server (For example, NFS server provisioner) as the persistence layer. The NFS server uses the volume to dynamically provision read-write-many (RWX) NFS volumes for the CML clients.