CDE hardware requirements
Review the requirements needed to get started with the Cloudera Data Engineering (CDE) service on Red Hat OpenShift.
Requirements
- CDE assumes it has cluster-admin privileges on the OpenShift cluster.
- Openshift cluster should be configured with route admission policy set to
namespaceOwnership: InterNamespaceAllowed. This allows Openshift cluster
to run applications in multiple namespaces with the same domain name.
oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission": {"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge
-
Table 1. The following are the CDE Service requirements: Component vCPU Memory Block PV or NFS PV Number of replicas Embedded DB 4 8 GB 100 GB 1 Config Manager 500 m 1 GB -- 2 Dex Downloads 250 m 512 MB -- 1 Knox 250 m 1 GB -- 1 Management API 1 2 GB -- 1 NGINX Ingress Controller 100 m 90 MB -- 1 FluentD Forwarder 250 m 512 MB -- 1 Grafana 250 m 512 MB 10 GB 1 Data Connector 250 m 512 MB -- 1 Total 7 15 GB 110 GB - CDE Service requirements: Overall for a CDE service, it requires 110 GB Block PV or NFS PV,
7 CPU cores, and 15 GB memory.
Table 2. The following are the CDE Virtual Cluster requirements for Spark 3: Component vCPU Memory Block PV or NFS PV Number of replicas Airflow API 350 m 612 MB 100 GB 1 Airflow Scheduler 1 1 GB 100 GB 1 Airflow Web 250 m 512 MB -- 1 Runtime API 250 m 512 MB 100 GB 1 Livy 3 12 GB 100 GB 1 SHS 250 m 1 GB 1 Pipelines 250 m 512 MB -- 1 Total 5350 m 15.6 GB 400 GB - CDE Virtual Cluster requirements:
- For Spark 3: Overall storage of 400 GB Block PV or Shared Storage PV, 5.35 CPU cores, and 15.6 GB per virtual cluster.
- For Spark 2: If you are using Spark 2, you need additional 500 m CPU, 4.5 GB memory and 100 GB storage, that is, the overall storage of 500 GB Block PV or Shared Storage PV, 5.85 CPU cores, and 20.1 GB per virtual cluster.
- Workloads: Depending upon the workload, you must configure resources.
- The Spark Driver container uses resources based on the configured driver cores and driver memory and additional 40% memory overhead.
- In addition to this, Spark Driver uses 110 m CPU and 232 MB for the sidecar container.
- The Spark Executor container uses resources based on the configured executor cores and executor memory and additional 40 % memory overhead.
- In addition to this, Spark Executor uses 10 m CPU and 32 MB for the sidecar container.
- Minimal Airflow jobs need 100 m CPU and 200 MB memory per Airflow worker.