Application autoscaling

Understand how autoscaling works for applications in Cloudera AI Inference service.

The autoscaling mechanism for applications in the Cloudera AI Inference service relies on a single metric, which is concurrency.

  • The autoscaler uses an effective target of 70 for concurrency.
  • The autoscaler automatically scales the number of application replicas up or down to match the incoming workload based on this concurrency target.