Autoscaling Model Endpoints using API
You can configure the Model Endpoints deployed on Cloudera AI Inference service to auto-scale to zero instances when there is no load.
The following deployment specification shows an example model endpoint that auto-scales
between zero and four replicas:
# cat ./examples/mlflow/model-spec-cml-registry.json
{
"namespace": "serving-default",
"name": "mlflow-wine-test-from-registry-onnx",
"source": {
"registry_source": {
"version": 1,
"model_id": "yf0o-hrxq-l0xj-8tk9"
}
},
"autoscaling": {
"min_replicas": "0",
"max_replicas": "4"
}
}