Deploying Agent Studio-supported LLMs using Cloudera AI Inference service

Configure and validate Model Hub Large Language Models (LLMs) to ensure compatibility with Agent Studio and Agentic workflows within Cloudera AI Inference service.

The Cloudera AI Inference service supports the following Agentic workflow-tagged LLM models:

nemotron-3-super-120b-a12b
llama-3.3-nemotron-super-49b

The models use helper script plugins to support tool invocation (function calling) and advanced reasoning. As standard model cards use local relative paths that are not resolved in containerized environments, you must configure the required runtime environment overrides by using the NIM_PASSTHROUGH_ARGS parameter.

In the Cloudera console, click the Cloudera AI tile.
The Cloudera AI Workbenches page is displayed.
Click on Model Endpoints in the left navigation pane.
The Endpoint Details page is displayed.
Go to Endpoint Details > Model Builder > Resource Profile > Advanced Options and enter the required details.

Figure 1. Configuring advanced options and environment variables for Model Endpoint creation
Click on +Add.
Select the NIM_PASSTHROUGH_ARGS environment variable from the Environment variables drop-down list.

Enter the corresponding Value based on your chosen model and precision profile.

Nemotron Super 120B (nvidia/nemotron-3-super-120b-a12b)

The plugin directory is precision-specific. Select the argument configuration based on the precision profile you are launching.

For the NVFP4 precision profile:

--reasoning-parser-plugin 
/mnt/serving/ngc/hub/models--nim--nvidia--nemotron-3-super-120b-a12b/snapshots/rl-030326-nvfp4/super_v3_reasoning_parser.py 
--reasoning-parser super_v3 
--enable-auto-tool-choice 
--tool-call-parser qwen3_coder

For the FP8 precision profile:

--reasoning-parser-plugin 
/mnt/serving/ngc/hub/models--nim--nvidia--nemotron-3-super-120b-a12b/snapshots/rl-030326-fp8/super_v3_reasoning_parser.py 
--reasoning-parser super_v3 
--enable-auto-tool-choice 
--tool-call-parser qwen3_coder

For the BF16 precision profile:

--reasoning-parser-plugin 
/mnt/serving/ngc/hub/models--nim--nvidia--nemotron-3-super-120b-a12b/snapshots/rl-030326-bf16/super_v3_reasoning_parser.py 
--reasoning-parser super_v3 
--enable-auto-tool-choice
--tool-call-parser qwen3_coder

Nemotron Super 49B (llama-3.3-nemotron-super-49b)

Due to runtime compatibility differences, use the following JSON fallback configuration for all precision profiles:
```
--enable-auto-tool-choice --tool-call-parser llama3_json
```