Supported vLLM command line arguments
Cloudera AI on cloud uses vLLM 0.15.1. For details about the command line arguments available in vLLM, see the vLLM documentation.
The following command line arguments are supported with vLLM 0.15.1:
- --trust-remote-code
- --dtype
- --seed
- --rope-scaling
- --rope-theta
- --max-model-len
- --quantization
- --enforce-eager
- --max-seq-len-to-capture
- --max-logprobs
- --logprobs-mode
- --disable-sliding-window
- --disable-cascade-attn
- --load-format
- --pipeline-parallel-size -pp
- --tensor-parallel-size -tp
- --block-size
- --gpu-memory-utilization
- --kv-cache-dtype
- --enable-prefix-caching
- --no-enable-prefix-caching
- --prefix-caching-hash-algo
- --cpu-offload-gb
- --calculate-kv-scales
- --max-num-batched-tokens
- --max-num-seqs
- --max-num-partial-prefills
- --max-long-partial-prefills
- --long-prefill-token-threshold
- --num-lookahead-slots
- --preemption-mode
- --num-scheduler-steps
- --multi-step-stream-outputs
- --scheduling-policy
- --enable-chunked-prefill
- --kv-transfer-config
- --disable-chunked-mm-input
- --enable-auto-tool-choice
- --tool-call-parser
You can find details on the above listed command line arguments here: vLLM arguments.
