Running Models on GPU
Follow the guidelines for running Models on GPU.
To run a Model Endpoint on GPU, specify the number of GPUs to use per model replica, as
follows. You must specify the GPU count as a string.
# cat ./examples/mlflow/model-spec-cml-registry.json
{ "namespace": "serving-default", "name": "mlflow-wine-test-from-registry-onnx", "source": { "registry_source": { "version": 1, "model_id": "yf0o-hrxq-l0xj-8tk9" } }, "resources": { "num_gpus": "1" } }
The resources field supports specifying the following properties:
- req_cpu - the requested number of CPU cores, which is a string as per Kubernetes standards.
- req_memory - the requested amount of memory, which again follows Kubernetes conventions.
- num_gpus - the requested number of GPUs.