Deploy a Cloudera AI Workbench model following the
instructions.
Go to the Project Overview page.
Select New Model.
Give the model a Name and a
Description.
In Deploy Model as, if the model is to be deployed in a
service account, select Service Account and choose the
account from the dropdown menu.
Deploy a model by setting the following details.
File: launch_model.py
Function: api_wrapper
Example
input:
{
"prompt": "How are you?",
"max_length": 64,
"temperature": "0.7"
}
In ML Runtime: Use a GPU edition Runtime.
In Resource Profile select at least 1 GPU. Use
a CPU/MEM profile suitable for your LLM size. This is required to load the LLM
into GPU VRAM.
In the Environment Variables: Add HF_TOKEN if
the model you are using has an access gate on Huggingface hub.
Click Deploy Model.
After successfully deploying the Cloudera AI Workbench model, you
can find examples of how to access it and run a test inference on the
Project Overview tab.