Cloudera AI Inference servicePDF version

Deploying models with Canary deployment using API

Cloudera AI Inference service allows users to control traffic percentage to specific model deployments.

  1. Deploy a model. The traffic value defaults to 100. The following is an example payload:
    # cat ./examples/mlflow/model-spec-cml-registry.json
    {
      "namespace": "serving-default",
      "name": "mlflow-wine-test-from-registry-onnx",
      "source": {
        "registry_source": {
          "version": 1, 
          "model_id": "yf0o-hrxq-l0xj-8tk9"
        }
      }
    }
    
  2. After deploying a model to the endpoint, direct the traffic to the Canary model by updating the model with the desired traffic value set:
    # cat ./examples/mlflow/model-spec-cml-registry-canary.json
    {
      "namespace": "serving-default",
      "name": "mlflow-wine-test-from-registry-onnx",
      "source": {
        "registry_source": {
          "version": 2, 
          "model_id": "yf0o-hrxq-l0xj-8tk9"
        }
      }
      "traffic": "35", 
    }
    

    The percentage of the traffic set in the traffic field diverts to the newly deployed model version and the remaining traffic is directed to the previously deployed model version.

  3. Update the model endpoint as follows, when you want your canary model to serve all of your incoming requests:
    # cat ./examples/mlflow/model-spec-cml-registry-promote.json
    {
      "namespace": "serving-default",
      "name": "mlflow-wine-test-from-registry-onnx",
      "source": {
        "registry_source": {
          "version": 2, 
          "model_id": "yf0o-hrxq-l0xj-8tk9"
        }
      }
      "traffic": "100", 
    }
    
  4. Use the following payload to content to the model:
    curl -H "Content-Type: application/json" -H 
    "Authorization: Bearer ${CDP_TOKEN}" 
    https://${DOMAIN}/namespaces/serving-default/endpoints/mlflow-wine-test-from-registry-onnx/v2/models/yf0o-hrxq-l0xj-8tk9/infer 
    -d @./examples/url/wine-input.json
    {"model_name":"yf0o-hrxq-l0xj-8tk9","model_version":"2","outputs":[{"name":"variable","datatype":"FP32","shape":[1,1],"data":[0.0]}]}
    
    

We want your opinion

How can we improve this page?

What kind of feedback do you have?