Application autoscaling
Understand how autoscaling works for applications in Cloudera AI Inference service.
The autoscaling mechanism for applications in the Cloudera AI Inference service relies on a single metric, which is concurrency.
- The autoscaler uses an effective target of 70 for concurrency.
- The autoscaler automatically scales the number of application replicas up or down to match the incoming workload based on this concurrency target.
