Supported vLLM command line arguments
Cloudera AI on cloud uses vLLM 0.8.4. The command line arguments available in vLLM 0.8.4 are detailed in vLLM arguments.
The following command line arguments are supported with vLLM 0.8.4:
- --block-size
- --calculate-kv-scales
- --cpu-offload-gb
- --disable-cascade-attn
- --disable-chunked-mm-input
- --disable-sliding-window
- --dtype
- --enable-auto-tool-choice
- --enable-chunked-prefill
- --enable-prefix-caching
- --enforce-eager
- --gpu-memory-utilization
- --kv-cache-dtype
- --load-format
- --logprobs-mode
- --long-prefill-token-threshold
- --max-logprobs
- --max-long-partial-prefills
- --max-model-len
- --max-num-batched-tokens
- --max-num-partial-prefills
- --max-num-seqs
- --max-seq-len-to-capture
- --multi-step-stream-outputs
- --no-enable-prefix-caching
- --num-lookahead-slots
- --num-scheduler-steps
- --pipeline-parallel-size, -pp
- --prefix-caching-hash-algo
- --preemption-mode
- --quantization
- --rope-scaling
- --rope-theta
- --scheduling-policy
- --seed
- --tensor-parallel-size, -tp
- --tool-call-parser
- --trust-remote-code
You can find details on the above listed command line arguments here: vLLM arguments.
