Recommendations for scaling CDE deployments
Your business might experience a sudden increase or drop in demand due to which your Cloudera Data Engineering (CDE) deployment needs to autoscale. You can scale your CDE deployment by either adding new instances of a CDE service or Virtual Cluster, or by adding additional resources to the existing ones.
- Vertically - More resources are provisioned within the same instance of a CDE service or Virtual Cluster.
- Horizontally - New instances of CDE service or Virtual Cluster are provisioned.
Virtual Clusters provide an isolated autoscaling compute capacity to run Spark and/or Airflow jobs. Virtual Clusters can be used to isolate individual teams or lines of business by using user-based access control lists (ACL).
Guidelines for scaling Virtual Clusters
Each Virtual Cluster requires infrastructure capacity to run various services such as Airflow, API server, and Spark-History-Server (SHS).
Recommendation: Do not scale horizontally beyond 50 Virtual Clusters within the same CDE service.
Virtual Clusters can actively run hundreds of parallel jobs. In certain scenarios, it might be required to simultaneously submit multiple jobs as per the schedule or due to a burst in demand. In these scenarios the API server cannot exceed 60 simultaneous job submissions. Once the jobs move from submission to running state, more jobs can be submitted.
Recommendation: Distribute simultaneous submission of jobs over time or horizontally scale across multiple Virtual Clusters.