Cloudera AI Inference service using OpenAI Python SDK client in a Cloudera AI Workbench Session

Consider the following instructions on interacting with an instruction-tuned large language model endpoint hosted on Cloudera AI Inference service.

You can use the following template to interact with an instruction-tuned large language model endpoint hosted on Cloudera AI Inference service:

from openai import OpenAI
import json

API_KEY = json.load(open("/tmp/jwt"))["access_token"]
MODEL_NAME = "[***MODEL_NAME***]"

client = OpenAI(
  base_url = "[***BASE_URL***]",
  api_key = API_KEY,
  )

completion = client.chat.completions.create(
  model=MODEL_NAME,
  messages=[{"role":"user","content":"Write a one-sentence definition of GenAI."}],
  temperature=0.2,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

Where base_url is the model endpoint URL up to the API version v1. To get the base_url, copy the model endpoint URL and delete the last two path components.

Copy the model endpoint URL from the Model Endpoint Details UI and modify it to

https://[***DOMAIN***]/namepaces/serving-default/endpoints/[***ENDPOINT_NAME***]/v1

MODEL_NAME is the model name assigned to the model when it is registered to the Cloudera AI Registry. You can find this in the Model Endpoint Details UI.

Cloudera AI Inference service using OpenAI Python SDK client in a Cloudera AI Workbench Session

We want your opinion

How can we improve this page?