Cloudera AI Inference service using OpenAI Python SDK client in a Cloudera AI Workbench Session
Consider the following instructions on interacting with an instruction-tuned large language model endpoint hosted on Cloudera AI Inference service.
You can use the following template to interact with an instruction-tuned large language
model endpoint hosted on Cloudera AI Inference service:
from openai import OpenAI
import json
API_KEY = json.load(open("/tmp/jwt"))["access_token"]
MODEL_ID = "[***MODEL_ID***]"
client = OpenAI(
base_url = "[***BASE_URL***]",
api_key = API_KEY,
)
completion = client.chat.completions.create(
model=MODEL_ID,
messages=[{"role":"user","content":"Write a one-sentence definition of GenAI."}],
temperature=0.2,
top_p=0.7,
max_tokens=1024,
stream=True
)
for chunk in completion:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
Where base_url
is the model endpoint URL up to the API version
v1
. To get the base_url
, copy the model endpoint URL
and delete the last two path components.
Copy the model endpoint URL from the Model Endpoint Details UI and
modify it to
https://[***DOMAIN***]/namepaces/serving-default/endpoints/[***ENDPOINT_NAME***]/v1
MODEL_ID
is the ID assigned to the model when it is registered to the
AI Registry. You can find this in the Model Endpoint
Details UI.