Running Models on GPU

Follow the guidelines for running Models on GPU.

To run a Model Endpoint on GPU, specify the number of GPUs to use per model replica, as follows. You must specify the GPU count as a string.
# cat ./examples/mlflow/model-spec-cml-registry.json
{
  "namespace": "serving-default",
  "name": "mlflow-wine-test-from-registry-onnx",
  "source": {
    "registry_source": {
      "version": 1, 
      "model_id": "yf0o-hrxq-l0xj-8tk9"
    }
  },
  "resources": {
    "num_gpus": "1"
  }
}
The resources field supports specifying the following properties:
  • req_cpu - the requested number of CPU cores, which is a string as per Kubernetes standards.
  • req_memory - the requested amount of memory, which again follows Kubernetes conventions.
  • num_gpus - the requested number of GPUs.