Supported models and servers in Cloudera AI Inference service
This topic provides a comprehensive inventory of model servers, runtimes, and supported model architectures available in Cloudera AI Inference service.
Hugging Face Model Server
| Component | Version |
|---|---|
| Hugging Face Model Server (KServe) | 0.17.0 |
| - vLLM | 0.15.1 |
Triton Inference Server
| Component | Version |
|---|---|
| Triton Inference Server | 25.08.06 |
| - PyTorch | 2.9.0 |
| - TensorFlow | 2.18.0 |
| - scikit-learn | 1.8.0 |
| - XGBoost | 3.1.2 |
| - LightGBM | 4.6.0 |
| - CatBoost | 1.2.7 |
| - MLflow | 3.10.1 |
NVIDIA NIM Models
| Vendor | Model | NGC Container | Versions |
|---|---|---|---|
| Baidu | NeMo Retriever PaddleOCR | nvcr.io/nim/baidu/paddleocr |
1.4.0, 1.5.0 |
| BigCode | Starcoder2 7B | nvcr.io/nim/bigcode/starcoder2-7b |
1.8.1, 1.14.1 |
| DeepSeek AI | DeepSeek R1 | nvcr.io/nim/deepseek-ai/deepseek-r1 |
1.7.3 |
| DeepSeek AI | DeepSeek R1 Distill Llama 70B | nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-70b |
1.5.2 |
| DeepSeek AI | DeepSeek R1 Distill Llama 8B | nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b |
1.5.2 |
| Meta | Llama 3.1 405B Instruct | nvcr.io/nim/meta/llama-3.1-405b-instruct |
1.3.0 |
| Meta | Llama 3.1 70B Instruct | nvcr.io/nim/meta/llama-3.1-70b-instruct |
1.2.1, 1.3.3, 1.8.5, 1.14.0 |
| Meta | Llama 3.1 8B Instruct | nvcr.io/nim/meta/llama-3.1-8b-instruct |
1.2.2, 1.3.3, 1.8.6, 1.13.1 |
| Meta | Llama 3.2 11B Vision | unknown | 1.1.1 |
| Meta | Llama 3.2 1B Instruct | nvcr.io/nim/meta/llama-3.2-1b-instruct |
1.6.0, 1.8.6, 1.12.0 |
| Meta | Llama 3.2 3B Instruct | nvcr.io/nim/meta/llama-3.2-3b-instruct |
1.6.0, 1.8.6, 1.10.1 |
| Meta | Llama 3.2 90B Vision | unknown | 1.1.1 |
| Meta | Llama 3.3 70B Instruct | nvcr.io/nim/meta/llama-3.3-70b-instruct |
1.8.2, 1.8.5, 1.14.0 |
| MIT | MIT Boltz2 | nvcr.io/nim/mit/boltz2 |
1.1.0, 1.3.0 |
| Mistral AI | Mistral 7B Instruct v0.3 | nvcr.io/nim/mistralai/mistral-7b-instruct-v0.3 |
1.1.2, 1.3.0, 1.12.0 |
| Mistral AI | Mixtral 8x22B Instruct v0.1 | nvcr.io/nim/mistralai/mixtral-8x22b-instruct-v01 |
1.2.2 |
| Mistral AI | Mixtral 8x7B Instruct v0.1 | nvcr.io/nim/mistralai/mixtral-8x7b-instruct-v0.1 |
1.2.1, 1.3.0, 1.8.4 |
| NVIDIA | Llama 3.1 Nemotron Nano 4B V1.1 | nvcr.io/nim/nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 |
1.8.4, 1.8.5 |
| NVIDIA | Llama 3.1 Nemotron Nano 8B | nvcr.io/nim/nvidia/llama-3.1-nemotron-nano-8b-v1 |
1.8.2, 1.8.4 |
| NVIDIA | Llama 3.2 EmbedQA 1B | nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2 |
1.5.0, 1.10.0 |
| NVIDIA | Llama 3.2 RerankQA 1B v2 | nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2 |
1.3.1, 1.8.0 |
| NVIDIA | Llama 3.3 Nemotron Super 49B | nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1 |
1.8.5, 1.8.6, 1.10.1 |
| NVIDIA | Llama 3.3 Nemotron Super 49B V1.5 | nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5 |
1.14.0 |
| NVIDIA | Llama 3.3 Nemotron Super 49B V1.5 | nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5-pb25h2 |
1.14.0 |
| NVIDIA | NVIDIA ASR Parakeet 1.1B CTC EN-US - Offline | nvcr.io/nim/nvidia/riva-asr/parakeet-1-1b-ctc-en-us-offline |
1.4.0 |
| NVIDIA | NVIDIA Riva ASR Whisper Large v3 | nvcr.io/nim/nvidia/riva-asr/whisper |
1.3.0, 1.3.1, 1.4.0 |
| NVIDIA | NVIDIA Riva TTS Magpie Multilingual | nvcr.io/nim/nvidia/riva-tts/magpie-tts-multilingual |
1.6.0 |
| NVIDIA | NeMo Retriever Graphic Elements | nvcr.io/nim/nvidia/nemoretriever-graphic-elements-v1 |
1.3.0, 1.6.0 |
| NVIDIA | NeMo Retriever Page Elements | nvcr.io/nim/nvidia/nemoretriever-page-elements-v2 |
1.3.0, 1.6.0 |
| NVIDIA | NeMo Retriever Table Structure | nvcr.io/nim/nvidia/nemoretriever-table-structure-v1 |
1.3.0, 1.6.0 |
| NVIDIA | NeMo Retriever-Parse | nvcr.io/nvstaging/nim/nemoretriever-parse |
1.2.0 |
| NVIDIA | Nemotron 3 Nano 30B | nvcr.io/nim/nvidia/nemotron-3-nano |
1.7.0 |
| NVIDIA | Nemotron Nano 12B V2 VL | nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl |
1.5.0 |
| OpenAI | OpenAI GPT-OSS 120B | nvcr.io/nim/openai/gpt-oss-120b |
1.12.1, 1.12.3, 1.12.4 |
| OpenAI | OpenAI GPT-OSS 20B | nvcr.io/nim/openai/gpt-oss-20b |
1.12.1, 1.12.3, 1.12.4 |
Hugging Face Supported Model Architectures
Text Generation Models (123)
afmoeforcausallm, apertusforcausallm, aquilaforcausallm, aquilamodel, arceeforcausallm, arcticforcausallm, baichuanforcausallm, bailingmoeforcausallm, bailingmoev2forcausallm, bambaforcausallm, bloomforcausallm, chatglmforconditionalgeneration, chatglmmodel, cohere2forcausallm, cohereforcausallm, cwmforcausallm, dbrxforcausallm, decilmforcausallm, deepseekforcausallm, deepseekv2forcausallm, deepseekv32forcausallm, deepseekv3forcausallm, dots1forcausallm, ernie4_5_moeforcausallm, ernie4_5forcausallm, exaone4forcausallm, exaoneforcausallm, exaonemoeforcausallm, fairseq2llamaforcausallm, falconforcausallm, falconh1forcausallm, falconmambaforcausallm, flexolmoforcausallm, gemma2forcausallm, gemma3forcausallm, gemma3nforcausallm, gemmaforcausallm, glm4forcausallm, glm4moeforcausallm, glm4moeliteforcausallm, glmforcausallm, gpt2lmheadmodel, gptbigcodeforcausallm, gptjforcausallm, gptneoxforcausallm, gptossforcausallm, graniteforcausallm, granitemoeforcausallm, granitemoehybridforcausallm, granitemoesharedforcausallm, gritlm, grok1forcausallm, grok1modelforcausallm, hcxvisionforcausallm, hunyuandensev1forcausallm, hunyuanmoev1forcausallm, internlm2forcausallm, internlm2veforcausallm, internlm3forcausallm, internlmforcausallm, iquestcoderforcausallm, iquestloopcoderforcausallm, jais2forcausallm, jaislmheadmodel, jambaforcausallm, kimilinearforcausallm, lfm2forcausallm, lfm2moeforcausallm, llama4forcausallm, llamaforcausallm, longcatflashforcausallm, mamba2forcausallm, mambaforcausallm, mimoforcausallm, mimov2flashforcausallm, minicpm3forcausallm, minicpmforcausallm, minimaxforcausallm, minimaxm1forcausallm, minimaxm2forcausallm, minimaxtext01forcausallm, mistralforcausallm, mistrallarge3forcausallm, mixtralforcausallm, mptforcausallm, nemotronforcausallm, nemotronhforcausallm, olmo2forcausallm, olmo3forcausallm, olmoeforcausallm, olmoforcausallm, optforcausallm, orionforcausallm, ouroforcausallm, panguembeddedforcausallm, pangupromoev2forcausallm, panguultramoeforcausallm, persimmonforcausallm, phi3forcausallm, phiforcausallm, phimoeforcausallm, plamo2forcausallm, plamo3forcausallm, qwen2forcausallm, qwen2moeforcausallm, qwen3forcausallm, qwen3moeforcausallm, qwen3nextforcausallm, qwenlmheadmodel, rwforcausallm, seedossforcausallm, solarforcausallm, stablelmepochforcausallm, stablelmforcausallm, starcoder2forcausallm, step1forcausallm, step3p5forcausallm, step3textforcausallm, telechat2forcausallm, telechatforcausallm, teleflmforcausallm, xverseforcausallm, zamba2forcausallm
Embedding Models (35)
bertmodel, bertspladesparseembeddingmodel, bgem3embeddingmodel, clipmodel, decilmforcausallm, gemma2model, gemma3textmodel, glmforcausallm, gpt2forsequenceclassification, gritlm, gtemodel, gtenewmodel, internlm2forrewardmodel, jambaforsequenceclassification, llamabidirectionalmodel, llamamodel, llavanextforconditionalgeneration, mistralmodel, modernbertmodel, nomicbertmodel, phi3forcausallm, phi3vforcausallm, prithvigeospatialmae, qwen2forcausallm, qwen2forprocessrewardmodel, qwen2forrewardmodel, qwen2model, qwen2vlforconditionalgeneration, robertaformaskedlm, robertamodel, siglipmodel, telechat2forcausallm, telechatforcausallm, terratorch, xlmrobertamodel
Cross-Encoder Models (9)
bertforsequenceclassification, bertfortokenclassification, gtenewforsequenceclassification, jinavlforranking, llamabidirectionalforsequenceclassification, modernbertforsequenceclassification, modernbertfortokenclassification, robertaforsequenceclassification, xlmrobertaforsequenceclassification
Multimodal Models (80)
ariaforconditionalgeneration, audioflamingo3forconditionalgeneration, ayavisionforconditionalgeneration, bagelforconditionalgeneration, beeforconditionalgeneration, blip2forconditionalgeneration, chameleonforconditionalgeneration, cohere2visionforconditionalgeneration, deepseekocrforcausallm, deepseekvlv2forcausallm, dotsocrforcausallm, eagle2_5_vlforconditionalgeneration, ernie4_5_vlmoeforconditionalgeneration, fuyuforcausallm, gemma3forconditionalgeneration, gemma3nforconditionalgeneration, glm4vforcausallm, glm4vforconditionalgeneration, glm4vmoeforconditionalgeneration, glmasrforconditionalgeneration, granitespeechforconditionalgeneration, h2ovlchatmodel, hunyuanvlforconditionalgeneration, idefics3forconditionalgeneration, interns1forconditionalgeneration, internvlchatmodel, internvlforconditionalgeneration, isaacforconditionalgeneration, kananavforconditionalgeneration, keyeforconditionalgeneration, keyevl1_5forconditionalgeneration, kimik25forconditionalgeneration, kimivlforconditionalgeneration, lfm2vlforconditionalgeneration, lightonocrforconditionalgeneration, llama4forconditionalgeneration, llama_nemotron_nano_vl, llavaforconditionalgeneration, llavanextforconditionalgeneration, llavanextvideoforconditionalgeneration, llavaonevisionforconditionalgeneration, mantisforconditionalgeneration, midashenglmmodel, minicpmo, minicpmv, minimaxvl01forconditionalgeneration, mistral3forconditionalgeneration, molmo2forconditionalgeneration, molmoforcausallm, nemotronh_nano_vl_v2, nemotronparseforconditionalgeneration, nvlm_d, opencuaforconditionalgeneration, ovis, ovis2_5, paddleocrvlforconditionalgeneration, paligemmaforconditionalgeneration, phi3vforcausallm, phi4mmforcausallm, pixtralforconditionalgeneration, qwen2_5_vlforconditionalgeneration, qwen2_5omniforconditionalgeneration, qwen2_5omnimodel, qwen2audioforconditionalgeneration, qwen2vlforconditionalgeneration, qwen3omnimoeforconditionalgeneration, qwen3vlforconditionalgeneration, qwen3vlmoeforconditionalgeneration, qwenvlforconditionalgeneration, rforconditionalgeneration, skyworkr1vchatmodel, smolvlmforconditionalgeneration, step3vlforconditionalgeneration, stepvlforconditionalgeneration, tarsier2forconditionalgeneration, tarsierforconditionalgeneration, ultravoxmodel, voxtralforconditionalgeneration, voxtralstreaminggeneration, whisperforconditionalgeneration
Speculative Decoding (20)
deepseekmtpmodel, eagle3llamaforcausallm, eagle3qwen2_5vlforcausallm, eagle3qwen3vlforcausallm, eagledeepseekmtpmodel, eaglellama4forcausallm, eaglellamaforcausallm, eagleminicpmforcausallm, eaglemistrallarge3forcausallm, erniemtpmodel, exaonemoemtp, glm4moelitemtpmodel, glm4moemtpmodel, llamaforcausallmeagle3, longcatflashmtpmodel, medusamodel, mimomtpmodel, openpangumtpmodel, qwen3nextmtp, step3p5mtp
Transformers Supported Models (2)
emu3forconditionalgeneration, smollm3forcausallm
Transformers Backend Models (10)
transformersembeddingmodel, transformersforcausallm, transformersforsequenceclassification, transformersmoeembeddingmodel, transformersmoeforcausallm, transformersmoeforsequenceclassification, transformersmultimodalembeddingmodel, transformersmultimodalforcausallm, transformersmultimodalforsequenceclassification, transformersmultimodalmoeforcausallm
