Using Hugging Face models
Deploying Hugging Face models might require additional configuration details.
| Model name | vLLM parameters | GPU | Test payload |
|---|---|---|---|
| google/gemma-4-31B-it |
--tensor-parallel-size 2 --quantization fp8 --max-model-len 8192 --max-num-batched-tokens 4096 |
2 x NVIDIA L40S |
|
