CML Requirements for Private Cloud

To launch Cloudera Machine Learning, the Private Cloud host must meet several requirements. In addition to specific software and hardware requirements mentioned in CDP Private Cloud Base Installation Guide, check the following CML-specific requirements.

If necessary, contact your Administrator to make sure the following requirements are satisfied:
  1. The installed OpenShift Container Platform must be version 4.3.x.
  2. CML assumes it has cluster-admin privileges on the OpenShift cluster.
  3. Storage:
    1. 4 TB of persistent volume block storage per ML Workspace.
    2. 1 TB of NFS space recommended per Workspace (depending on user files).
    3. Access to NFS storage is routable from all pods running in the OpenShift cluster.
    4. For monitoring, recommended volume size is 60 GB.
  4. A block storage class must be marked as default in the OpenShift cluster. This may be rook-ceph-block, Portworx, or another storage system. Confirm the storage class by listing the storage classes in the Openshift cluster, and check that one of them is marked default.
  5. If external NFS is used, the NFS directory and assumed permissions must be those of the cdsw user. For details see Using an External NFS Server in the Related information section at the bottom of this page.
  6. If CML needs access to a database on the CDP Private Cloud Base cluster, then the user must be authenticated using Kerberos and must have Ranger policies set up to allow read/write operations to the default (or other specified) database.
  7. Forward and reverse DNS must be working.
  8. DNS lookups to sub-domains and the ML Workspace itself should work.
  9. In DNS, wildcard subdomains (such as *.cml.yourcompany.com) must be set to resolve to the master domain (such as cml.yourcompany.com). The TLS certificate (if TLS is used) must also include the wildcard subdomains. When a session or job is started, an engine is created for it, and the engine is assigned to a random, unique subdomain.
  10. The external load balancer server timeout needs to be set to 5 min. Without this, creating a project in an ML workspace with git clone or with the API may result in API timeout errors. For workarounds, please take a look at Known Issue DSE-11837.
  11. If you intend to access a workspace over https, see Deploy an ML Workspace with Support for TLS.
  12. Due to a Red Hat issue with OpenShift Container Platform 4.3.x, the image registry cluster operator configuration must be set to Managed.
  13. Check if storage is set up in the cluster image registry operator. See Known Issues DSE-12778 for further information.

For more information on requirements, see CDP Private Cloud Base Installation Guide.

Storage

We expect the OpenShift cluster to have persistent storage classes defined for both “block” and “filesystem” volumeModes of storage. We expect a block storage class to have been set up as default. Exact amount of storage classified as block or filesystem storage will depend on the specific workloads (Machine Learning or Data Warehouse) and how they are used:
  • Data Warehousing for instance, will require 128GB of memory and 600GB of locally attached SSD storage, with a 100GB of Persistent Volume storage on filesystem mounts, per executor. Depending on the number of executors the user would like to run per physical node, the per-node requirements will change proportionally (eg. 3 executors per node would require 384GB of memory and 1.8TB of locally attached storage).
  • Machine learning requirements on storage largely depend on the nature of your machine learning jobs; 4TB of persistent volume block storage is required per Machine Learning Workspace instance for storing different kinds of metadata related to workspace configuration. Additionally, Machine Learning requires access to NFS storage routable from all pods running in the OpenShift cluster (see below).
  • Monitoring uses a large Prometheus instance to scrape workloads. Disk usage depends on scale of workloads. Recommended volume size is 60GB.

Local Storage (e.g. ext4)

Block PV (e.g. Ceph or Portworx)

NFS (for ML User Project files)

Control Plane

N/A

250 GB

N/A

CDW

600 GB per executor

100 GB per executor

N/A

CML

N/A

4 TB per workspace

1 TB per workspace (Depends on ML User files)

NFS

Cloudera Machine Learning (CML) requires NFS for storing project files and folders. An internal user-space NFS server can be deployed into the cluster which serves a block storage device (persistent volume) managed by the cluster’s software defined storage (SDS) system, such as Ceph, Portworx, etc. This is the recommended option for CML in private cloud. Alternatively, the NFS server can be external to the cluster, e.g., a NetApp filer which is accessible from the private cloud cluster nodes. Note that NFS storage is to be used only for storing project files and folders, and not for any other CML data, such as PostgreSQL database and livelog.

Note that CML today does not support shared volumes, such as Portworx shared volumes, for storing project files. A read-write-once (RWO) persistent volume must be allocated to the internal NFS server (e.g., NFS server provisioner) as the persistence layer. The NFS server uses the volume to dynamically provision read-write-many (RWX) NFS volumes for the CML clients.