Requirements for Cloudera AI on OpenShift Container Platform

To launch the Cloudera AI service, the OpenShift Container Platform (OCP) host must meet several requirements. Review the following Cloudera AI-specific software, NFS server, and storage requirements.

Requirements

If necessary, contact your Administrator to make sure the following requirements are satisfied:
  1. If you are using OpenShift, check that the version of the installed OpenShift Container Platform is exactly as listed in Software Support Matrix for OpenShift.

  2. Cloudera AI assumes it has cluster-admin privileges on the cluster.
  3. Storage:
    1. Persistent volume block storage per Cloudera AI Workbench: 600 GB minimum, 4.5 TB recommended.
    2. 1 TB of external NFS space recommended per workbench (depending on user files). If using embedded NFS, 1 TB per workbench in addition to the 600 GB minimum, or 4.5 TB recommended block storage space.
    3. Access to NFS storage is routable from all pods running in the cluster.
    4. For monitoring, recommended volume size is 60 GB.
  4. On OpenShift Container Platform, CephFS is used as the underlying storage provisioner for any new internal workbench on Cloudera on premises 1.5.x. A storage class named ocs-storagecluster-cephfs with csi driver set to openshift-storage.cephfs.csi.ceph.com must exist in the cluster for new internal workbenches to get provisioned.
  5. A block storage class must be marked as default in the cluster. This may be rook-ceph-block, Portworx, or another storage system. Confirm the storage class by listing the storage classes (run oc get sc) in the cluster, and check that one of them is marked default.
  6. If external NFS is used, the NFS directory and assumed permissions must be those of the cdsw user. For details see Using an External NFS Server in the Related information section.
  7. If Cloudera AI needs access to a database on the Cloudera Base on premises cluster, then the user must be authenticated using Kerberos and must have Ranger policies set up to allow read/write operations to the default (or other specified) database.
  8. Ensure that Kerberos is enabled for all services in the cluster. Custom Kerberos principals are not currently supported. For more information, see Enabling Kerberos for authentication.
  9. Forward and reverse DNS must be working.
  10. DNS lookups to sub-domains and the Cloudera AI Workbench itself shall work properly.
  11. In DNS, wildcard subdomains (such as *.cml.yourcompany.com) must be set to resolve to the master domain (such as cml.yourcompany.com). The TLS certificate (if TLS is used) must also include the wildcard subdomains. When a session or job is started, an engine is created for it, and the engine is assigned to a random, unique subdomain.
  12. The external load balancer server timeout needs to be set to 5 min. Without this, creating a project in an Cloudera AI Workbench with git clone or with the API may result in API timeout errors. For workarounds, see Known Issue DSE-11837.
  13. If you intend to access a workbench over https, see Deploy a Cloudera AI Workbench with Support for TLS.
  14. For non-TLS Cloudera AI Workbench, websockets need to be allowed for port 80 on the external load balancer.
  15. Only a TLS-enabled custom Docker Registry is supported. Ensure that you use a TLS certificate to secure the custom Docker Registry. The TLS certificate can be self-signed, or signed by a private or public trusted Certificate Authority (CA).
  16. On OpenShift, due to a Red Hat issue with OpenShift Container Platform 4.3.x, the image registry cluster operator configuration must be set to Managed.
  17. Check if storage is set up in the cluster image registry operator. See Known Issues DSE-12778 for further information.

For more information on requirements, see Cloudera Base on premises Installation Guide.

Hardware requirements

Storage

The cluster must have persistent storage classes defined for both block and filesystem volumeModes of storage. Ensure that a block storage class is set up. The exact amount of storage classified as block or filesystem storage depends on the specific workload used:
  • Cloudera AI workload requirements for storage largely depend on the nature of your machine learning jobs. 4 TB of persistent volume block storage is required per Cloudera AI Workbench instance for storing different kinds of metadata related to workbench configuration. Additionally, Cloudera AI requires access to NFS storage routable from all pods running in the cluster (see below).
  • Monitoring uses a large Prometheus instance to scrape workloads. Disk usage depends on scale of workloads. Recommended volume size is 60 GB.
Local Storage (for example, ext4) Block PV (for example, Ceph or Portworx) NFS (for Cloudera AI user project files)
Control Plane N/A 250 GB N/A
Cloudera AI N/A 1.5 TB per workbench 1 TB per workbench (dependent on the size of the Cloudera AI user files)

NFS

Cloudera AI requires NFS 4.0 for storing project files and folders. NFS storage is to be used only for storing project files and folders, and not for any other Cloudera AI data, such as PostgreSQL database and LiveLog.

Cloudera Embedded Container Service requirements for NFS Storage

Cloudera managed Cloudera Embedded Container Service deploys and manages an internal NFS server based on LongHorn which can be used for Cloudera AI. This is the recommended option for Cloudera AI on Cloudera Embedded Container Service clusters. Cloudera AI requires nfs-utils in order to mount longhorn-nfs provisioned mounts.

Cloudera AI requires the nfs-utils package be installed in order to mount volumes provisioned by longhorn-nfs. The nfs-utils package is not available by default on every operating system. Check if nfs-utils is available, and ensure that it is present on all Cloudera Embedded Container Service cluster nodes.

Alternatively, the NFS server can be external to the cluster, such as a NetApp filer that is accessible from the on premises cluster nodes.

OpenShift requirements for NFS storage

An internal user-space NFS server can be deployed into the cluster which serves a block storage device (persistent volume) managed by the cluster’s software defined storage (SDS) system, such as Ceph or Portworx. This is the recommended option for Cloudera AI on OpenShift. Alternatively, the NFS server can be external to the cluster, such as a NetApp filer that is accessible from the on premises cluster nodes. NFS storage is to be used only for storing project files and folders, and not for any other Cloudera AI data, such as PostgreSQL database and LiveLog.

Cloudera AI does not support shared volumes, such as Portworx shared volumes, for storing project files. A read-write-once (RWO) persistent volume must be allocated to the internal NFS server (for example, NFS server provisioner) as the persistence layer. The NFS server uses the volume to dynamically provision read-write-many (RWX) NFS volumes for the Cloudera AI clients.