Supported models and servers in Cloudera AI Inference service

This topic provides a comprehensive inventory of model servers, runtimes, and supported model architectures available in Cloudera AI Inference service.

Hugging Face Model Server

Table 1. Hugging Face Model Server Components
Component Version
Hugging Face Model Server (KServe) 0.17.0
- vLLM 0.15.1

Triton Inference Server

Table 2. Triton Inference Server Components
Component Version
Triton Inference Server 25.08.06
- PyTorch 2.9.0
- TensorFlow 2.18.0
- scikit-learn 1.8.0
- XGBoost 3.1.2
- LightGBM 4.6.0
- CatBoost 1.2.7
- MLflow 3.10.1

NVIDIA NIM Models

Table 3. Supported NVIDIA NIM Models
Vendor Model NGC Container Versions
Baidu NeMo Retriever PaddleOCR nvcr.io/nim/baidu/paddleocr 1.4.0, 1.5.0
BigCode Starcoder2 7B nvcr.io/nim/bigcode/starcoder2-7b 1.8.1, 1.14.1
DeepSeek AI DeepSeek R1 nvcr.io/nim/deepseek-ai/deepseek-r1 1.7.3
DeepSeek AI DeepSeek R1 Distill Llama 70B nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-70b 1.5.2
DeepSeek AI DeepSeek R1 Distill Llama 8B nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b 1.5.2
Meta Llama 3.1 405B Instruct nvcr.io/nim/meta/llama-3.1-405b-instruct 1.3.0
Meta Llama 3.1 70B Instruct nvcr.io/nim/meta/llama-3.1-70b-instruct 1.2.1, 1.3.3, 1.8.5, 1.14.0
Meta Llama 3.1 8B Instruct nvcr.io/nim/meta/llama-3.1-8b-instruct 1.2.2, 1.3.3, 1.8.6, 1.13.1
Meta Llama 3.2 11B Vision unknown 1.1.1
Meta Llama 3.2 1B Instruct nvcr.io/nim/meta/llama-3.2-1b-instruct 1.6.0, 1.8.6, 1.12.0
Meta Llama 3.2 3B Instruct nvcr.io/nim/meta/llama-3.2-3b-instruct 1.6.0, 1.8.6, 1.10.1
Meta Llama 3.2 90B Vision unknown 1.1.1
Meta Llama 3.3 70B Instruct nvcr.io/nim/meta/llama-3.3-70b-instruct 1.8.2, 1.8.5, 1.14.0
MIT MIT Boltz2 nvcr.io/nim/mit/boltz2 1.1.0, 1.3.0
Mistral AI Mistral 7B Instruct v0.3 nvcr.io/nim/mistralai/mistral-7b-instruct-v0.3 1.1.2, 1.3.0, 1.12.0
Mistral AI Mixtral 8x22B Instruct v0.1 nvcr.io/nim/mistralai/mixtral-8x22b-instruct-v01 1.2.2
Mistral AI Mixtral 8x7B Instruct v0.1 nvcr.io/nim/mistralai/mixtral-8x7b-instruct-v0.1 1.2.1, 1.3.0, 1.8.4
NVIDIA Llama 3.1 Nemotron Nano 4B V1.1 nvcr.io/nim/nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 1.8.4, 1.8.5
NVIDIA Llama 3.1 Nemotron Nano 8B nvcr.io/nim/nvidia/llama-3.1-nemotron-nano-8b-v1 1.8.2, 1.8.4
NVIDIA Llama 3.2 EmbedQA 1B nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2 1.5.0, 1.10.0
NVIDIA Llama 3.2 RerankQA 1B v2 nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2 1.3.1, 1.8.0
NVIDIA Llama 3.3 Nemotron Super 49B nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1 1.8.5, 1.8.6, 1.10.1
NVIDIA Llama 3.3 Nemotron Super 49B V1.5 nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5 1.14.0
NVIDIA Llama 3.3 Nemotron Super 49B V1.5 nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5-pb25h2 1.14.0
NVIDIA NVIDIA ASR Parakeet 1.1B CTC EN-US - Offline nvcr.io/nim/nvidia/riva-asr/parakeet-1-1b-ctc-en-us-offline 1.4.0
NVIDIA NVIDIA Riva ASR Whisper Large v3 nvcr.io/nim/nvidia/riva-asr/whisper 1.3.0, 1.3.1, 1.4.0
NVIDIA NVIDIA Riva TTS Magpie Multilingual nvcr.io/nim/nvidia/riva-tts/magpie-tts-multilingual 1.6.0
NVIDIA NeMo Retriever Graphic Elements nvcr.io/nim/nvidia/nemoretriever-graphic-elements-v1 1.3.0, 1.6.0
NVIDIA NeMo Retriever Page Elements nvcr.io/nim/nvidia/nemoretriever-page-elements-v2 1.3.0, 1.6.0
NVIDIA NeMo Retriever Table Structure nvcr.io/nim/nvidia/nemoretriever-table-structure-v1 1.3.0, 1.6.0
NVIDIA NeMo Retriever-Parse nvcr.io/nvstaging/nim/nemoretriever-parse 1.2.0
NVIDIA Nemotron 3 Nano 30B nvcr.io/nim/nvidia/nemotron-3-nano 1.7.0
NVIDIA Nemotron Nano 12B V2 VL nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl 1.5.0
OpenAI OpenAI GPT-OSS 120B nvcr.io/nim/openai/gpt-oss-120b 1.12.1, 1.12.3, 1.12.4
OpenAI OpenAI GPT-OSS 20B nvcr.io/nim/openai/gpt-oss-20b 1.12.1, 1.12.3, 1.12.4

Hugging Face Supported Model Architectures

Text Generation Models (123)

afmoeforcausallm, apertusforcausallm, aquilaforcausallm, aquilamodel, arceeforcausallm, arcticforcausallm, baichuanforcausallm, bailingmoeforcausallm, bailingmoev2forcausallm, bambaforcausallm, bloomforcausallm, chatglmforconditionalgeneration, chatglmmodel, cohere2forcausallm, cohereforcausallm, cwmforcausallm, dbrxforcausallm, decilmforcausallm, deepseekforcausallm, deepseekv2forcausallm, deepseekv32forcausallm, deepseekv3forcausallm, dots1forcausallm, ernie4_5_moeforcausallm, ernie4_5forcausallm, exaone4forcausallm, exaoneforcausallm, exaonemoeforcausallm, fairseq2llamaforcausallm, falconforcausallm, falconh1forcausallm, falconmambaforcausallm, flexolmoforcausallm, gemma2forcausallm, gemma3forcausallm, gemma3nforcausallm, gemmaforcausallm, glm4forcausallm, glm4moeforcausallm, glm4moeliteforcausallm, glmforcausallm, gpt2lmheadmodel, gptbigcodeforcausallm, gptjforcausallm, gptneoxforcausallm, gptossforcausallm, graniteforcausallm, granitemoeforcausallm, granitemoehybridforcausallm, granitemoesharedforcausallm, gritlm, grok1forcausallm, grok1modelforcausallm, hcxvisionforcausallm, hunyuandensev1forcausallm, hunyuanmoev1forcausallm, internlm2forcausallm, internlm2veforcausallm, internlm3forcausallm, internlmforcausallm, iquestcoderforcausallm, iquestloopcoderforcausallm, jais2forcausallm, jaislmheadmodel, jambaforcausallm, kimilinearforcausallm, lfm2forcausallm, lfm2moeforcausallm, llama4forcausallm, llamaforcausallm, longcatflashforcausallm, mamba2forcausallm, mambaforcausallm, mimoforcausallm, mimov2flashforcausallm, minicpm3forcausallm, minicpmforcausallm, minimaxforcausallm, minimaxm1forcausallm, minimaxm2forcausallm, minimaxtext01forcausallm, mistralforcausallm, mistrallarge3forcausallm, mixtralforcausallm, mptforcausallm, nemotronforcausallm, nemotronhforcausallm, olmo2forcausallm, olmo3forcausallm, olmoeforcausallm, olmoforcausallm, optforcausallm, orionforcausallm, ouroforcausallm, panguembeddedforcausallm, pangupromoev2forcausallm, panguultramoeforcausallm, persimmonforcausallm, phi3forcausallm, phiforcausallm, phimoeforcausallm, plamo2forcausallm, plamo3forcausallm, qwen2forcausallm, qwen2moeforcausallm, qwen3forcausallm, qwen3moeforcausallm, qwen3nextforcausallm, qwenlmheadmodel, rwforcausallm, seedossforcausallm, solarforcausallm, stablelmepochforcausallm, stablelmforcausallm, starcoder2forcausallm, step1forcausallm, step3p5forcausallm, step3textforcausallm, telechat2forcausallm, telechatforcausallm, teleflmforcausallm, xverseforcausallm, zamba2forcausallm

Embedding Models (35)

bertmodel, bertspladesparseembeddingmodel, bgem3embeddingmodel, clipmodel, decilmforcausallm, gemma2model, gemma3textmodel, glmforcausallm, gpt2forsequenceclassification, gritlm, gtemodel, gtenewmodel, internlm2forrewardmodel, jambaforsequenceclassification, llamabidirectionalmodel, llamamodel, llavanextforconditionalgeneration, mistralmodel, modernbertmodel, nomicbertmodel, phi3forcausallm, phi3vforcausallm, prithvigeospatialmae, qwen2forcausallm, qwen2forprocessrewardmodel, qwen2forrewardmodel, qwen2model, qwen2vlforconditionalgeneration, robertaformaskedlm, robertamodel, siglipmodel, telechat2forcausallm, telechatforcausallm, terratorch, xlmrobertamodel

Cross-Encoder Models (9)

bertforsequenceclassification, bertfortokenclassification, gtenewforsequenceclassification, jinavlforranking, llamabidirectionalforsequenceclassification, modernbertforsequenceclassification, modernbertfortokenclassification, robertaforsequenceclassification, xlmrobertaforsequenceclassification

Multimodal Models (80)

ariaforconditionalgeneration, audioflamingo3forconditionalgeneration, ayavisionforconditionalgeneration, bagelforconditionalgeneration, beeforconditionalgeneration, blip2forconditionalgeneration, chameleonforconditionalgeneration, cohere2visionforconditionalgeneration, deepseekocrforcausallm, deepseekvlv2forcausallm, dotsocrforcausallm, eagle2_5_vlforconditionalgeneration, ernie4_5_vlmoeforconditionalgeneration, fuyuforcausallm, gemma3forconditionalgeneration, gemma3nforconditionalgeneration, glm4vforcausallm, glm4vforconditionalgeneration, glm4vmoeforconditionalgeneration, glmasrforconditionalgeneration, granitespeechforconditionalgeneration, h2ovlchatmodel, hunyuanvlforconditionalgeneration, idefics3forconditionalgeneration, interns1forconditionalgeneration, internvlchatmodel, internvlforconditionalgeneration, isaacforconditionalgeneration, kananavforconditionalgeneration, keyeforconditionalgeneration, keyevl1_5forconditionalgeneration, kimik25forconditionalgeneration, kimivlforconditionalgeneration, lfm2vlforconditionalgeneration, lightonocrforconditionalgeneration, llama4forconditionalgeneration, llama_nemotron_nano_vl, llavaforconditionalgeneration, llavanextforconditionalgeneration, llavanextvideoforconditionalgeneration, llavaonevisionforconditionalgeneration, mantisforconditionalgeneration, midashenglmmodel, minicpmo, minicpmv, minimaxvl01forconditionalgeneration, mistral3forconditionalgeneration, molmo2forconditionalgeneration, molmoforcausallm, nemotronh_nano_vl_v2, nemotronparseforconditionalgeneration, nvlm_d, opencuaforconditionalgeneration, ovis, ovis2_5, paddleocrvlforconditionalgeneration, paligemmaforconditionalgeneration, phi3vforcausallm, phi4mmforcausallm, pixtralforconditionalgeneration, qwen2_5_vlforconditionalgeneration, qwen2_5omniforconditionalgeneration, qwen2_5omnimodel, qwen2audioforconditionalgeneration, qwen2vlforconditionalgeneration, qwen3omnimoeforconditionalgeneration, qwen3vlforconditionalgeneration, qwen3vlmoeforconditionalgeneration, qwenvlforconditionalgeneration, rforconditionalgeneration, skyworkr1vchatmodel, smolvlmforconditionalgeneration, step3vlforconditionalgeneration, stepvlforconditionalgeneration, tarsier2forconditionalgeneration, tarsierforconditionalgeneration, ultravoxmodel, voxtralforconditionalgeneration, voxtralstreaminggeneration, whisperforconditionalgeneration

Speculative Decoding (20)

deepseekmtpmodel, eagle3llamaforcausallm, eagle3qwen2_5vlforcausallm, eagle3qwen3vlforcausallm, eagledeepseekmtpmodel, eaglellama4forcausallm, eaglellamaforcausallm, eagleminicpmforcausallm, eaglemistrallarge3forcausallm, erniemtpmodel, exaonemoemtp, glm4moelitemtpmodel, glm4moemtpmodel, llamaforcausallmeagle3, longcatflashmtpmodel, medusamodel, mimomtpmodel, openpangumtpmodel, qwen3nextmtp, step3p5mtp

Transformers Supported Models (2)

emu3forconditionalgeneration, smollm3forcausallm

Transformers Backend Models (10)

transformersembeddingmodel, transformersforcausallm, transformersforsequenceclassification, transformersmoeembeddingmodel, transformersmoeforcausallm, transformersmoeforsequenceclassification, transformersmultimodalembeddingmodel, transformersmultimodalforcausallm, transformersmultimodalforsequenceclassification, transformersmultimodalmoeforcausallm