Supported models and servers in Cloudera AI Inference service

This topic provides a comprehensive inventory of model servers, runtimes, and supported model architectures available in Cloudera AI Inference service.

Hugging Face Model Server

Table 1. Hugging Face Model Server Components
Component Version
Hugging Face Model Server (KServe) 0.18.0
- vLLM 0.20.0

Triton Inference Server

Table 2. Triton Inference Server Components
Component Version
Triton Inference Server 25.08.06
- PyTorch 2.11
- TensorFlow 2.18
- scikit-learn 1.18
- XGBoost 3.1.2
- LightGBM 4.6.0
- CatBoost 1.2.7
- MLflow 3.12

NVIDIA NIM Models

Table 3. Supported NVIDIA NIM Models
Vendor Model NGC Container Versions
Baidu NeMo Retriever PaddleOCR nvcr.io/nim/baidu/paddleocr 1.4.0, 1.5.0
BigCode Starcoder2 7B nvcr.io/nim/bigcode/starcoder2-7b 1.8.1, 1.14.1, 1.15.3
DeepSeek AI DeepSeek R1 nvcr.io/nim/deepseek-ai/deepseek-r1 1.7.3
DeepSeek AI DeepSeek R1 Distill Llama 70B unknown 1.5.2
DeepSeek AI DeepSeek R1 Distill Llama 8B unknown 1.5.2
Meta Llama 3.1 405B Instruct nvcr.io/nim/meta/llama-3.1-405b-instruct 1.3.0
Meta Llama 3.1 70B Instruct nvcr.io/nim/meta/llama-3.1-70b-instruct 1.2.1, 1.3.3, 1.8.5, 1.14.0, 1.14.0-pb5.2-stig-fips-x86-64, 1.14.0-pb5.5-stig-fips-x86-64
Meta Llama 3.1 8B Instruct nvcr.io/nim/meta/llama-3.1-8b-instruct 1.2.2, 1.3.3, 1.8.6, 1.13.1, 1.14.0-pb5.2-stig-fips-x86-64, 1.14.0-pb5.5-stig-fips-x86-64
Meta Llama 3.2 11B Vision unknown 1.1.1
Meta Llama 3.2 1B Instruct nvcr.io/nim/meta/llama-3.2-1b-instruct 1.6.0, 1.8.6, 1.12.0
Meta Llama 3.2 3B Instruct nvcr.io/nim/meta/llama-3.2-3b-instruct 1.6.0, 1.8.6, 1.10.1
Meta Llama 3.2 90B Vision unknown 1.1.1
Meta Llama 3.3 70B Instruct nvcr.io/nim/meta/llama-3.3-70b-instruct 1.8.2, 1.8.5, 1.14.0, 1.15.1, 2.0.3
MiniMax AI MiniMax M2.5 nvcr.io/nim/minimax-ai/minimax-m25 1.7.1
MIT MIT Boltz2 nvcr.io/nim/mit/boltz2 1.1.0, 1.3.0, 1.5.0
Mistral AI Mistral 7B Instruct v0.3 nvcr.io/nim/mistralai/mistral-7b-instruct-v0.3 1.1.2, 1.3.0, 1.12.0
Mistral AI Mixtral 8x22B Instruct v0.1 unknown 1.2.2
Mistral AI Mixtral 8x7B Instruct v0.1 nvcr.io/nim/mistralai/mixtral-8x7b-instruct-v0.1 1.2.1, 1.3.0, 1.8.4, 1.12.0
NVIDIA Cosmos Reason2 8B nvcr.io/nim/nvidia/cosmos-reason2-8b 1.7.0
NVIDIA Llama 3.1 Nemotron Nano 4B V1.1 nvcr.io/nim/nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 1.8.4, 1.8.5
NVIDIA Llama 3.1 Nemotron Nano 8B nvcr.io/nim/nvidia/llama-3.1-nemotron-nano-8b-v1 1.8.2, 1.8.4
NVIDIA Llama 3.2 EmbedQA 1B nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2 1.5.0, 1.10.0, 1.11.3-stig-fips-x86
NVIDIA Llama 3.2 RerankQA 1B v2 nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2 1.3.1, 1.8.0, 1.9.3-stig-fips-x86
NVIDIA Llama 3.3 Nemotron Super 49B nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1 1.8.5, 1.8.6, 1.10.1
NVIDIA Llama 3.3 Nemotron Super 49B V1.5 nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5 1.14.0, 1.14.0-pb5.3, 2.0.3
NVIDIA NVIDIA ASR Parakeet 1.1B CTC EN-US - Offline nvcr.io/nim/nvidia/riva-asr/parakeet-1-1b-ctc-en-us-offline 1.4.0
NVIDIA NVIDIA Riva ASR Whisper Large v3 nvcr.io/nim/nvidia/riva-asr/whisper 1.3.0, 1.3.1, 1.4.0
NVIDIA NVIDIA Riva TTS Magpie Multilingual nvcr.io/nim/nvidia/riva-tts/magpie-tts-multilingual 1.6.0
NVIDIA NeMo Retriever Graphic Elements nvcr.io/nim/nvidia/nemoretriever-graphic-elements-v1 1.3.0, 1.6.0
NVIDIA NeMo Retriever Page Elements unknown 1.3.0, 1.6.0
NVIDIA NeMo Retriever Page Elements v3 nvcr.io/nim/nvidia/nemoretriever-page-elements-v3 1.7.0
NVIDIA NeMo Retriever Table Structure nvcr.io/nim/nvidia/nemoretriever-table-structure-v1 1.3.0, 1.6.0
NVIDIA NeMo Retriever-Parse unknown 1.2.0
NVIDIA Nemotron 3 Nano 30B nvcr.io/nim/nvidia/nemotron-3-nano 1.7.0, 2.0.3
NVIDIA Nemotron 3 Super 120B A12B nvcr.io/nim/nvidia/nemotron-3-super-120b-a12b 1.8.1, 2.0.3
NVIDIA Nemotron Nano 12B V2 VL nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl 1.5.0
NVIDIA Nemotron Parse nvcr.io/nim/nvidia/nemotron-parse 1.5.0
OpenAI OpenAI GPT-OSS 120B nvcr.io/nim/openai/gpt-oss-120b 1.12.1, 1.12.3, 1.12.4, 2.0.3
OpenAI OpenAI GPT-OSS 20B nvcr.io/nim/openai/gpt-oss-20b 1.12.1, 1.12.3, 1.12.4, 2.0.3

Hugging Face Supported Model Architectures

Text Generation Models (139)

afmoeforcausallm, apertusforcausallm, aquilaforcausallm, aquilamodel, arceeforcausallm, arcticforcausallm, axk1forcausallm, baichuanforcausallm, bailingmoeforcausallm, bailingmoev2_5forcausallm, bailingmoev2forcausallm, bambaforcausallm, bloomforcausallm, chatglmforconditionalgeneration, chatglmmodel, cohere2forcausallm, cohereforcausallm, cwmforcausallm, dbrxforcausallm, decilmforcausallm, deepseekforcausallm, deepseekv2forcausallm, deepseekv32forcausallm, deepseekv3forcausallm, deepseekv4forcausallm, dots1forcausallm, ernie4_5_moeforcausallm, ernie4_5forcausallm, exaone4forcausallm, exaoneforcausallm, exaonemoeforcausallm, fairseq2llamaforcausallm, falconforcausallm, falconh1forcausallm, falconmambaforcausallm, flexolmoforcausallm, gemma2forcausallm, gemma3forcausallm, gemma3nforcausallm, gemma4forcausallm, gemmaforcausallm, glm4forcausallm, glm4moeforcausallm, glm4moeliteforcausallm, glmforcausallm, glmmoedsaforcausallm, gpt2lmheadmodel, gptbigcodeforcausallm, gptjforcausallm, gptneoxforcausallm, gptossforcausallm, graniteforcausallm, granitemoeforcausallm, granitemoehybridforcausallm, granitemoesharedforcausallm, gritlm, grok1forcausallm, grok1modelforcausallm, hcxvisionforcausallm, hcxvisionv2forcausallm, hunyuandensev1forcausallm, hunyuanmoev1forcausallm, hyperclovaxforcausallm, hyv3forcausallm, internlm2forcausallm, internlm2veforcausallm, internlm3forcausallm, internlmforcausallm, iquestcoderforcausallm, iquestloopcoderforcausallm, jais2forcausallm, jaislmheadmodel, jambaforcausallm, kimilinearforcausallm, lfm2forcausallm, lfm2moeforcausallm, llama4forcausallm, llamaforcausallm, longcatflashforcausallm, mamba2forcausallm, mambaforcausallm, mimoforcausallm, mimov2flashforcausallm, minicpm3forcausallm, minicpmforcausallm, minimaxforcausallm, minimaxm1forcausallm, minimaxm2forcausallm, minimaxtext01forcausallm, ministral3forcausallm, mistralforcausallm, mistrallarge3forcausallm, mixtralforcausallm, mptforcausallm, nemotronforcausallm, nemotronhforcausallm, nemotronhpuzzleforcausallm, olmo2forcausallm, olmo3forcausallm, olmoeforcausallm, olmoforcausallm, olmohybridforcausallm, optforcausallm, orionforcausallm, ouroforcausallm, panguembeddedforcausallm, pangupromoev2forcausallm, panguultramoeforcausallm, param2moeforcausallm, persimmonforcausallm, phi3forcausallm, phiforcausallm, phimoeforcausallm, plamo2forcausallm, plamo3forcausallm, qwen2forcausallm, qwen2moeforcausallm, qwen3forcausallm, qwen3moeforcausallm, qwen3nextforcausallm, qwenlmheadmodel, rnj1forcausallm, rwforcausallm, sarvammlaforcausallm, sarvammoeforcausallm, seedossforcausallm, solarforcausallm, stablelmepochforcausallm, stablelmforcausallm, starcoder2forcausallm, step1forcausallm, step3p5forcausallm, step3textforcausallm, telechat2forcausallm, telechat3forcausallm, telechatforcausallm, teleflmforcausallm, xverseforcausallm, zamba2forcausallm

Embedding Models (11)

bertmodel, bertspladesparseembeddingmodel, bgem3embeddingmodel, erniemodel, gemma2model, gemma3textmodel, gtemodel, gtenewmodel, jinaembeddingsv5model, llamabidirectionalmodel, llamamodel

Late Interaction Models (11)

colbertjinarobertamodel, colbertlfm2model, colbertmodernbertmodel, colmodernvbertforretrieval, colpaliforretrieval, colqwen3, colqwen3_5, hf_colbert, jinaforranking, opscolqwen3model, qwen3vlnemotronembedmodel

Reward Models (3)

internlm2forrewardmodel, qwen2forprocessrewardmodel, qwen2forrewardmodel

Token Classification Models (4)

bertfortokenclassification, erniefortokenclassification, modernbertfortokenclassification, qwen3asrforcedalignerfortokenclassification

Sequence Classification Models (11)

bertforsequenceclassification, ernieforsequenceclassification, gpt2forsequenceclassification, gtenewforsequenceclassification, jambaforsequenceclassification, jinavlforranking, llamabidirectionalforsequenceclassification, llamanemotronvlforsequenceclassification, modernbertforsequenceclassification, robertaforsequenceclassification, xlmrobertaforsequenceclassification

Multimodal Models (106)

ariaforconditionalgeneration, audioflamingo3forconditionalgeneration, ayavisionforconditionalgeneration, bagelforconditionalgeneration, beeforconditionalgeneration, blip2forconditionalgeneration, chameleonforconditionalgeneration, cheers, cheersforconditionalgeneration, cohere2visionforconditionalgeneration, cohereasrforconditionalgeneration, deepseekocr2forcausallm, deepseekocrforcausallm, deepseekvlv2forcausallm, dotsocrforcausallm, eagle2_5_vlforconditionalgeneration, ernie4_5_vlmoeforconditionalgeneration, exaone4_5_forconditionalgeneration, fireredasr2forconditionalgeneration, fireredlidforconditionalgeneration, funasrforconditionalgeneration, funaudiochatforconditionalgeneration, fuyuforcausallm, gemma3forconditionalgeneration, gemma3nforconditionalgeneration, gemma4forconditionalgeneration, glm4vforcausallm, glm4vforconditionalgeneration, glm4vmoeforconditionalgeneration, glmasrforconditionalgeneration, glmocrforconditionalgeneration, granite4visionforconditionalgeneration, granitespeechforconditionalgeneration, h2ovlchatmodel, hunyuanvlforconditionalgeneration, idefics3forconditionalgeneration, interns1forconditionalgeneration, interns1proforconditionalgeneration, internvlchatmodel, internvlforconditionalgeneration, isaacforconditionalgeneration, kananavforconditionalgeneration, keyeforconditionalgeneration, keyevl1_5forconditionalgeneration, kimiaudioforconditionalgeneration, kimik25forconditionalgeneration, kimivlforconditionalgeneration, lfm2vlforconditionalgeneration, lightonocrforconditionalgeneration, llama4forconditionalgeneration, llama_nemotron_nano_vl, llavaforconditionalgeneration, llavanextforconditionalgeneration, llavanextvideoforconditionalgeneration, llavaonevisionforconditionalgeneration, mantisforconditionalgeneration, midashenglmmodel, minicpmo, minicpmv, minimaxvl01forconditionalgeneration, mistral3forconditionalgeneration, molmo2forconditionalgeneration, molmoforcausallm, moonshotkimiaforcausallm, musicflamingoforconditionalgeneration, nemotronh_nano_omni_reasoning_v3, nemotronh_nano_vl_v2, nemotronh_super_omni_reasoning_v3, nemotronparseforconditionalgeneration, nvlm_d, opencuaforconditionalgeneration, openpanguvlforconditionalgeneration, ovis, ovis2_5, ovis2_6_moeforcausallm, ovis2_6forcausallm, paddleocrvlforconditionalgeneration, paligemmaforconditionalgeneration, phi3vforcausallm, phi4forcausallmv, phi4mmforcausallm, pixtralforconditionalgeneration, qwen2_5_vlforconditionalgeneration, qwen2_5omniforconditionalgeneration, qwen2_5omnimodel, qwen2audioforconditionalgeneration, qwen2vlforconditionalgeneration, qwen3_5forconditionalgeneration, qwen3_5moeforconditionalgeneration, qwen3asrforconditionalgeneration, qwen3asrrealtimegeneration, qwen3omnimoeforconditionalgeneration, qwen3vlforconditionalgeneration, qwen3vlmoeforconditionalgeneration, qwenvlforconditionalgeneration, rforconditionalgeneration, skyworkr1vchatmodel, smolvlmforconditionalgeneration, step3vlforconditionalgeneration, stepvlforconditionalgeneration, tarsier2forconditionalgeneration, tarsierforconditionalgeneration, ultravoxmodel, voxtralforconditionalgeneration, voxtralrealtimegeneration, whisperforconditionalgeneration

Speculative Decoding (33)

deepseekmtpmodel, deepseekv4mtpmodel, dflashdraftmodel, eagle3deepseekv2forcausallm, eagle3deepseekv3forcausallm, eagle3llamaforcausallm, eagle3minimaxm2forcausallm, eagle3qwen2_5vlforcausallm, eagle3qwen3vlforcausallm, eagledeepseekmtpmodel, eaglellama4forcausallm, eaglellamaforcausallm, eagleminicpmforcausallm, eaglemistrallarge3forcausallm, erniemtpmodel, exaone4_5_mtp, exaonemoemtp, extracthiddenstatesmodel, glm4moelitemtpmodel, glm4moemtpmodel, glmocrmtpmodel, hyv3mtpmodel, llamaforcausallmeagle3, longcatflashmtpmodel, medusamodel, mimomtpmodel, mlpspeculatorpretrainedmodel, nemotronhmtpmodel, openpangumtpmodel, qwen3_5moemtp, qwen3_5mtp, qwen3nextmtp, step3p5mtp

Transformers Supported Models (2)

emu3forconditionalgeneration, smollm3forcausallm

Transformers Backend Models (10)

transformersembeddingmodel, transformersforcausallm, transformersforsequenceclassification, transformersmoeembeddingmodel, transformersmoeforcausallm, transformersmoeforsequenceclassification, transformersmultimodalembeddingmodel, transformersmultimodalforcausallm, transformersmultimodalforsequenceclassification, transformersmultimodalmoeforcausallm