Cloudera AI Inference service using OpenAI Python SDK client in a Cloudera AI Workbench Session

Consider the following instructions on interacting with an instruction-tuned large language model endpoint hosted on Cloudera AI Inference service.

You can use the following template to interact with an instruction-tuned large language model endpoint hosted on Cloudera AI Inference service:
from openai import OpenAI
import json

API_KEY = json.load(open("/tmp/jwt"))["access_token"]
MODEL_ID = "[***MODEL_ID***]"

client = OpenAI(
  base_url = "[***BASE_URL***]",
  api_key = API_KEY,
  )

completion = client.chat.completions.create(
  model=MODEL_ID,
  messages=[{"role":"user","content":"Write a one-sentence definition of GenAI."}],
  temperature=0.2,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

Where base_url is the model endpoint URL up to the API version v1. To get the base_url, copy the model endpoint URL and delete the last two path components.

Copy the model endpoint URL from the Model Endpoint Details UI and modify it to
https://[***DOMAIN***]/namepaces/serving-default/endpoints/[***ENDPOINT_NAME***]/v1

MODEL_ID is the ID assigned to the model when it is registered to the AI Registry. You can find this in the Model Endpoint Details UI.