Recommendations for scaling CDE deployments
Your business might experience a sudden increase or drop in demand due to which your Cloudera Data Engineering (CDE) deployment needs to autoscale. You can scale your CDE deployment by either adding new instances of a CDE service or Virtual Cluster, or by adding additional resources to the existing ones.
- Vertically - More resources are provisioned within the same instance of a CDE service or Virtual Cluster.
- Horizontally - New instances of CDE service or Virtual Cluster are provisioned.

Virtual Clusters provide an isolated autoscaling compute capacity to run Spark and/or Airflow jobs. Virtual Clusters can be used to isolate individual teams or lines of business by using user-based access control lists (ACL).
Guidelines for scaling Virtual Clusters
-
Each Virtual Cluster requires infrastructure capacity to run various services such as Airflow, API server, and Spark-History-Server (SHS).
Recommendation: Do not scale horizontally beyond 50 Virtual Clusters within the same CDE service.
-
Virtual Clusters can actively run hundreds of parallel jobs. In certain scenarios, it might be required to simultaneously submit multiple jobs as per the schedule or due to a burst in demand. In these scenarios the API server cannot exceed 60 simultaneous job submissions. Once the jobs move from submission to running state, more jobs can be submitted.
Recommendation: Distribute simultaneous submission of jobs over time or horizontally scale across multiple Virtual Clusters.