Cloudera AI Inference service using OpenAI Python SDK client in a Cloudera AI Workbench Session
Consider the following instructions on interacting with an instruction-tuned large
language model endpoint hosted on Cloudera AI Inference service.
You can use the following template to interact with an instruction-tuned large language
model endpoint hosted on Cloudera AI Inference service:
from openai import OpenAIimport jsonAPI_KEY = json.load(open("/tmp/jwt"))["access_token"]MODEL_NAME = "[***MODEL_NAME***]"client = OpenAI( base_url = "[***BASE_URL***]", api_key = API_KEY, )completion = client.chat.completions.create( model=MODEL_NAME, messages=[{"role":"user","content":"Write a one-sentence definition of GenAI."}], temperature=0.2, top_p=0.7, max_tokens=1024, stream=True)for chunk in completion: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="")
Where base_url is the model endpoint URL up to the API version
v1. To get the base_url, copy the model endpoint URL
and delete the last two path components.
Copy the model endpoint URL from the Model Endpoint Details UI and
modify it to
MODEL_NAME is the model name assigned to the model when it is
registered to the Cloudera AI Registry. You can find this in the
Model Endpoint Details UI.