Autoscaling Model Endpoints using API
You can configure the Model Endpoints deployed on Cloudera AI Inference service to auto-scale to zero instances when there is no load.
The following deployment specification shows an example model endpoint that auto-scales
between zero and four replicas:
# cat ./examples/mlflow/model-spec-cml-registry.json
{ "namespace": "serving-default", "name": "mlflow-wine-test-from-registry-onnx", "source": { "registry_source": { "version": 1, "model_id": "yf0o-hrxq-l0xj-8tk9" } }, "autoscaling": { "min_replicas": "0", "max_replicas": "4" } }