CDE hardware requirements

Review the requirements needed to get started with the Cloudera Data Engineering (CDE) service on Red Hat OpenShift.

Requirements

  • CDE assumes it has cluster-admin privileges on the OpenShift cluster.
  • Openshift cluster should be configured with route admission policy set to namespaceOwnership: InterNamespaceAllowed. This allows Openshift cluster to run applications in multiple namespaces with the same domain name.
    oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":
    {"routeAdmission":{
    "namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge 
  • Table 1. The following are the CDE Service requirements:
    Component vCPU Memory Block PV or NFS PV Number of replicas
    Embedded DB 4 8 GB 100 GB 1
    Config Manager 500 m 1 GB -- 2
    Dex Downloads 250 m 512 MB -- 1
    Knox 250 m 1 GB -- 1
    Management API 1 2 GB -- 1
    NGINX Ingress Controller 100 m 90 MB -- 1
    FluentD Forwarder 250 m 512 MB -- 1
    Grafana 250 m 512 MB 10 GB 1
    Data Connector 250 m 512 MB -- 1
    Total 7 15 GB 110 GB
  • CDE Service requirements: Overall for a CDE service, it requires 110 GB Block PV or NFS PV, 7 CPU cores, and 15 GB memory.
    Table 2. The following are the CDE Virtual Cluster requirements for Spark 3:
    Component vCPU Memory Block PV or NFS PV Number of replicas
    Airflow API 350 m 612 MB 100 GB 1
    Airflow Scheduler 1 1 GB 100 GB 1
    Airflow Web 250 m 512 MB -- 1
    Runtime API 250 m 512 MB 100 GB 1
    Livy 3 12 GB 100 GB 1
    SHS 250 m 1 GB 1
    Pipelines 250 m 512 MB -- 1
    Total 5350 m 15.6 GB 400 GB
  • CDE Virtual Cluster requirements:
    • For Spark 3: Overall storage of 400 GB Block PV or Shared Storage PV, 5.35 CPU cores, and 15.6 GB per virtual cluster.
    • For Spark 2: If you are using Spark 2, you need additional 500 m CPU, 4.5 GB memory and 100 GB storage, that is, the overall storage of 500 GB Block PV or Shared Storage PV, 5.85 CPU cores, and 20.1 GB per virtual cluster.
  • Workloads: Depending upon the workload, you must configure resources.
    • The Spark Driver container uses resources based on the configured driver cores and driver memory and additional 40% memory overhead.
    • In addition to this, Spark Driver uses 110 m CPU and 232 MB for the sidecar container.
    • The Spark Executor container uses resources based on the configured executor cores and executor memory and additional 40 % memory overhead.
    • In addition to this, Spark Executor uses 10 m CPU and 32 MB for the sidecar container.
    • Minimal Airflow jobs need 100 m CPU and 200 MB memory per Airflow worker.