Using Hugging Face models

Deploying Hugging Face models might require additional configuration details.

Table 1. Using Hugging Face models
Model name	vLLM parameters	GPU	Test payload
google/gemma-4-31B-it	--tensor-parallel-size 2 --quantization fp8 --max-model-len 8192 --max-num-batched-tokens 4096	2 x NVIDIA L40S	`{ "model": "google/gemma-4-31B-it", "messages": [ { "role": "user", "content": "What is Generative AI?" } ], "stream": true }`