Storage

The OpenShift cluster must have persistent storage classes defined for both “block” and “filesystem” volumeModes of storage.

The exact amount of storage classified as block or filesystem storage will depend on the specific workloads (Machine Learning or Data Warehouse) and how they are used:

  • For example, Data Warehousing will require 128 GB of memory and 600 GB of locally attached SSD storage, with 100 GB of persistent volume storage on filesystem mounts, per executor. Depending on the number of executors the user would like to run per physical node, the per-node requirements will change proportionally. For example, 3 executors per node would require 384 GB of memory and 1.8 TB of locally attached storage.
  • Machine learning requirements on storage largely depend on the nature of your machine learning jobs; 4TB100GB of persistent volume block storage is required per Machine Learning Workspace instance for storing different kinds of metadata related to workspace configuration. Additionally, Machine Learning requires access to NFS storage routable from all pods running in the OpenShift cluster (see below).
  • Monitoring uses a large Prometheus instance to scrape workloads. Disk usage depends on the scale of the workloads. Cloudera recommends 60 GB.

OpenShift Storage Requirements Summary

Local Storage (e.g. ext4) Block PV (e.g. Ceph or Portworx) NFS
Control Plane N/A 250 GB N/A
CDW 600 GB per executor 100 GB per executor N/A
CML N/A 4 TB per workspace 1 Ti per workspace (depending on ML user files.