Running Models on GPU

Follow the guidelines for running Models on GPU.

To run a Model Endpoint on GPU, specify the number of GPUs to use per model replica, as follows. You must specify the GPU count as a string.

# cat ./examples/mlflow/model-spec-cml-registry.json

{
  "namespace": "serving-default",
  "name": "mlflow-wine-test-from-registry-onnx",
  "source": {
    "registry_source": {
      "version": 1, 
      "model_id": "yf0o-hrxq-l0xj-8tk9"
    }
  },
  "resources": {
    "num_gpus": "1"
  }
}

The resources field supports specifying the following properties:

req_cpu - the requested number of CPU cores, which is a string as per Kubernetes standards.
req_memory - the requested amount of memory, which again follows Kubernetes conventions.
num_gpus - the requested number of GPUs.

Running Models on GPU

We want your opinion

How can we improve this page?