Running Models on GPU
Follow the guidelines for running Models on GPU.
To run a Model Endpoint on GPU, specify the number of GPUs to use per model replica, as
follows. You must specify the GPU count as a string.
# cat ./examples/mlflow/model-spec-cml-registry.json
{
"namespace": "serving-default",
"name": "mlflow-wine-test-from-registry-onnx",
"source": {
"registry_source": {
"version": 1,
"model_id": "yf0o-hrxq-l0xj-8tk9"
}
},
"resources": {
"num_gpus": "1"
}
}
The resources field supports specifying the following properties:
- req_cpu - the requested number of CPU cores, which is a string as per Kubernetes standards.
- req_memory - the requested amount of memory, which again follows Kubernetes conventions.
- num_gpus - the requested number of GPUs.