Deploying Agent Studio-supported LLMs using Cloudera AI Inference service
Configure and validate Model Hub Large Language Models (LLMs) to ensure compatibility
with Agent Studio and Agentic workflow within Cloudera AI Inference service.
The Cloudera AI Inference service supports two specific models:
The models use helper script plugins to support tool invocation (function calling)
and advanced reasoning. As standard model cards use local relative paths that are
not resolved in containerized environments, you must configure the required runtime
environment overrides by using the NIM_PASSTHROUGH_ARGS
parameter.
When deploying either of these two models using Cloudera AI,
you must configure an additional environment variable during the final deployment
step.
In the Cloudera console, click the
Cloudera AI
tile.
The Cloudera AI Workbenches page is
displayed.
Click on Model Endpoints in the left navigation
pane.
The Endpoint Details page is displayed.
Open Endpoint Details > Model Builder > Resource Profile > Advanced Options pages and enter the required details.
Figure 1. Configuring Advanced Options and Environment Variables for Model
Endpoint Creation
Click on + Add.
When deploying either of the Model Hub Large Language Models (LLMs) using Cloudera AI Inference service, users must configure an additional
environment variable.
Select the NIM_PASSTHROUGH_ARGS environment variable from
the Environment variables drop-down list.
Enter the corresponding Value based on your chosen model
and precision profile:
Nemotron Super 120B
(nvidia/nemotron-3-super-120b-a12b) -
The plugin directory is precision-specific. Select the argument
configuration based on the precision profile you are launching: