Quantization#
SGLang-Diffusion supports quantized transformer checkpoints. In most cases, keep the base model and the quantized transformer override separate.
Quick Reference#
Use these paths:
--model-path: the base or original model--transformer-path: a quantized transformers-style transformer component directory that already contains its ownconfig.json--transformer-weights-path: quantized transformer weights provided as a single safetensors file, a sharded safetensors directory, a local path, or a Hugging Face repo ID
Recommended example:
sglang generate \
--model-path black-forest-labs/FLUX.2-dev \
--transformer-weights-path black-forest-labs/FLUX.2-dev-NVFP4 \
--prompt "a curious pikachu"
For quantized transformers-style transformer component folders:
sglang generate \
--model-path /path/to/base-model \
--transformer-path /path/to/quantized-transformer \
--prompt "A Logo With Bold Large Text: SGL Diffusion"
NOTE: Some model-specific integrations also accept a quantized repo or local
directory directly as --model-path, but that is a compatibility path. If a
repo contains multiple candidate checkpoints, pass
--transformer-weights-path explicitly.
Quant Families#
Here, quant_family means a checkpoint and loading family with shared CLI
usage and loader behavior. It is not just the numeric precision or a kernel
backend.
quant_family |
checkpoint form |
canonical CLI |
supported models |
extra dependency |
platform / notes |
|---|---|---|---|---|---|
|
Quantized transformer component folder, or safetensors with |
|
ALL |
None |
Component-folder and single-file flows are both supported |
|
NVFP4 safetensors file, sharded directory, or repo providing transformer weights |
|
FLUX.2 |
|
Blackwell can use a best-performance kit when available; otherwise SGLang falls back to the generic ModelOpt FP4 path |
|
Pre-quantized Nunchaku transformer weights, usually named |
|
Model-specific support such as Qwen-Image, FLUX, and Z-Image |
|
SGLang can infer precision and rank from the filename and supports both |
NVFP4#
Usage Examples#
Recommended usage keeps the base model and quantized transformer override separate:
sglang generate \
--model-path black-forest-labs/FLUX.2-dev \
--transformer-weights-path black-forest-labs/FLUX.2-dev-NVFP4 \
--prompt "A Logo With Bold Large Text: SGL Diffusion" \
--save-output
SGLang also supports passing the NVFP4 repo or local directory directly as
--model-path:
sglang generate \
--model-path black-forest-labs/FLUX.2-dev-NVFP4 \
--prompt "A Logo With Bold Large Text: SGL Diffusion" \
--save-output
Notes#
--transformer-weights-pathis still the canonical CLI for NVFP4 transformer checkpoints.Direct
--model-pathloading is a compatibility path for FLUX.2 NVFP4-style repos or local directories.If
--transformer-weights-pathis provided explicitly, it takes precedence over the compatibility--model-pathflow.For local directories, SGLang first looks for
*-mixed.safetensors, then falls back to loading from the directory.On Blackwell,
comfy-kitchencan provide the best-performance path when available; otherwise SGLang falls back to the generic ModelOpt FP4 path.
Nunchaku (SVDQuant)#
Install#
Install the runtime dependency first:
pip install nunchaku
For platform-specific installation methods and troubleshooting, see the Nunchaku installation guide.
File Naming and Auto-Detection#
For Nunchaku checkpoints, --model-path should still point to the original
base model, while --transformer-weights-path points to the quantized
transformer weights.
If the basename of --transformer-weights-path contains the pattern
svdq-(int4|fp4)_r{rank}, SGLang will automatically:
enable SVDQuant
infer
--quantization-precisioninfer
--quantization-rank
Examples:
checkpoint name fragment |
inferred precision |
inferred rank |
notes |
|---|---|---|---|
|
|
|
Standard INT4 checkpoint |
|
|
|
Higher-quality INT4 checkpoint |
|
|
|
|
|
|
|
Higher-quality NVFP4 checkpoint |
Common filenames:
filename |
precision |
rank |
typical use |
|---|---|---|---|
|
|
|
Balanced default |
|
|
|
Quality-focused |
|
|
|
RTX 50-series / NVFP4 path |
|
|
|
Quality-focused NVFP4 |
|
|
|
Lightning 4-step |
|
|
|
Lightning 8-step |
If your checkpoint name does not follow this convention, pass
--enable-svdquant, --quantization-precision, and --quantization-rank
explicitly.
Usage Examples#
Recommended auto-detected flow:
sglang generate \
--model-path Qwen/Qwen-Image \
--transformer-weights-path /path/to/svdq-int4_r32-qwen-image.safetensors \
--prompt "a beautiful sunset" \
--save-output
Manual override when the filename does not encode the quant settings:
sglang generate \
--model-path Qwen/Qwen-Image \
--transformer-weights-path /path/to/custom_nunchaku_checkpoint.safetensors \
--enable-svdquant \
--quantization-precision int4 \
--quantization-rank 128 \
--prompt "a beautiful sunset" \
--save-output
Notes#
--transformer-weights-pathis the canonical flag for Nunchaku checkpoints. Older config names such asquantized_model_pathare treated as compatibility aliases.Auto-detection only happens when the checkpoint basename matches
svdq-(int4|fp4)_r{rank}.The CLI values are
int4andnvfp4. In filenames, the NVFP4 variant is written asfp4.Lightning checkpoints usually expect matching
--num-inference-steps, such as4or8.Current runtime validation only allows Nunchaku on NVIDIA CUDA Ampere (SM8x) or SM12x GPUs. Hopper (SM90) is currently rejected.