Deploying the Cloudera AI Workbench model

Deploy a Cloudera AI Workbench model following the instructions.

  1. Go to the Project Overview page.
  2. Select New Model.
  3. Give the model a Name and a Description.
  4. In Deploy Model as, if the model is to be deployed in a service account, select Service Account and choose the account from the dropdown menu.
  5. Deploy a model by setting the following details.

    File: launch_model.py

    Function: api_wrapper

    Example input:
    {
      "prompt": "How are you?",
      "max_length": 64,
      "temperature": "0.7"
    }
    
    Figure 1. Deploy a model
  6. In ML Runtime: Use a GPU edition Runtime.
  7. In Resource Profile select at least 1 GPU. Use a CPU/MEM profile suitable for your LLM size. This is required to load the LLM into GPU VRAM.
  8. In the Environment Variables: Add HF_TOKEN if the model you are using has an access gate on Huggingface hub.
  9. Click Deploy Model.

After successfully deploying the Cloudera AI Workbench model, you can find examples of how to access it and run a test inference on the Project Overview tab.