Deploying a Predictive Deep Learning Model

The following example illustrates how to deploy the resnet-18 image classification model from ONNX Model Zoo.

You must first import the model into a session in CML Workspace, which must be in the same CDP environment as the AI Inference service and Cloudera Model Registry.

Upload the model artifact to your project’s file system.

From the Project overview page, complete the upload from your local computer using the upload button. In your session, load the artifact to the Cloudera Machine Learning experiments page by using the following script:

!pip install onnx mlflow onnxruntime

import onnx
import mlflow
import onnxruntime

resnet18 = onnx.load("resnet18-v1-7.onnx")
mlflow.set_experiment("resnet18")
with mlflow.start_run() as run:
    mlflow.onnx.log_model(resnet18,
                          "resnet-demo",
                          registered_model_name="model1")

Navigate to the Experiments page and locate your experiment.
Select the desired experiment and select the artifact folder.
Select Register model to upload to model registry.

Navigate to the AI Inference Service to deploy the model, using the following payload to deploy the resnet18 model into our cluster. Replace the model_id and version with the matching values from the workspace model registry page:

curl -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CDP_TOKEN}" \
  "https://${DOMAIN}/api/v1alpha1/deployEndpoint" -d \
'{
    "namespace": "serving-default",
    "name": "resnet-18-onnx",
    "source": {
      "registry_source": {
        "version": 1, 
        "model_id": "pr5z-mc4s-hrxq-5zg4"
      }
    }
}

Follow the status of this model through the describe model endpoint:

curl -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CDP_TOKEN}" \
  "https://${DOMAIN}/api/v1alpha1/describeEndpoint" -d \
'{
    "namespace": "serving-default",
    "name": "resnet18-onnx"
}'

See the following output when you describe the model endpoint:

{
  "namespace": "serving-default",
  "name": "resnet18-onnx",
  "url": "",
  "conditions": [...]
  "status": {
    "failed_copies": 0,
    "total_copies": 0,
    "active_model_state": "",
    "target_model_state": "Loading",
    "transition_status": "InProgress"
  },
  "observed_generation": 2,
  "replica_count": 0,
  "created_by": "csso_user",
  "description": "",
  "created_at": "2024-05-10T19:22:21Z",
  "resources": {
    "req_cpu": "1",
    "req_memory": "2Gi"
    "num_gpus": "N/A"
  },
  "source": {
    "registry_source": {
      "model_id": "pr5z-mc4s-hrxq-5zg4",
      "version": 1
    }
  },
  "autoscaling": {...},
  "endpointmetadata": {
    "current_model": {
      "registry_source": {
        "model_id": "pr5z-mc4s-hrxq-5zg4",
        "version": 1
      }
    },
    "previous_model": null
  },
  "traffic": {
    "current_revision_traffic": "100",
    "previous_revision_traffic": "0"
  },
  "api_standard": "oip",
  "chat": false
}

From this output you can infer the following information:

The model comes from the Cloudera Model Registry, its ID is pr5z-mc4s-hrxq-5zg4, its version is 1.
The Model Endpoint is conforming to the Open Inference Protocol (OIP) API standard.
The chat attribute is false, therefore the model is only able to respond on the /v2/models/[***model_name***]/infer endpoint.

Loading is in progress, The URL is not yet present for this endpoint.

Eventually, the model is ready and the described payload shall appear as follows:

{
  "namespace": "serving-default",
  "name": "resnet18-onnx",
  "url": "https://ml-1fcaa8cf-a94.eng-ml-i.svbr-nqvp.int.cldr.work/namespaces/serving-default/endpoints/resnet18-onnx/v2/models/pr5z-mc4s-hrxq-5zg4/infer",
  "conditions": [ ...
  ],
  "status": {
    "failed_copies": 0,
    "total_copies": 1,
    "active_model_state": "Loaded",
    "target_model_state": "Loaded",
    "transition_status": "UpToDate"
  },
  "observed_generation": 2,
  "replica_count": 1,
  "created_by": "csso_user",
  "description": "",
  "created_at": "2024-05-10T19:22:21Z",
  "resources": {
    "req_cpu": "1",
    "req_memory": "2Gi",
    "num_gpus": "N/A"
  },
  "source": {
    "registry_source": {
      "model_id": "pr5z-mc4s-hrxq-5zg4",
      "version": 1
    }
  },
  "autoscaling": {
    "min_replicas": "1",
    "max_replicas": "1",
    "autoscalingconfig": {
      "metric": "",
      "target": "",
      "target_utilization": "",
      "scale_to_zero_retention": "",
      "activation_scale": "",
      "scale_down_delay": "",
      "panic_window_percentage": "",
      "panic_threshold": "",
      "stable_window": ""
    }
  },
  "endpointmetadata": {
    "current_model": {
      "registry_source": {
        "model_id": "pr5z-mc4s-hrxq-5zg4",
        "version": 1
      }
    },
    "previous_model": null
  },
  "traffic": {
    "current_revision_traffic": "100",
    "previous_revision_traffic": "0"
  },
  "api_standard": "oip",
  "chat": false
}

Now you have loaded the resnet18 model into your Cloudera AI Inference service cluster.