Quantization#

SGLang-Diffusion supports quantized transformer checkpoints. In most cases, keep the base model and the quantized transformer override separate.

Quick Reference#

Use these paths:

  • --model-path: the base or original model

  • --transformer-path: a quantized transformers-style transformer component directory that already contains its own config.json

  • --transformer-weights-path: quantized transformer weights provided as a single safetensors file, a sharded safetensors directory, a local path, or a Hugging Face repo ID

Recommended example:

sglang generate \
  --model-path black-forest-labs/FLUX.2-dev \
  --transformer-weights-path black-forest-labs/FLUX.2-dev-NVFP4 \
  --prompt "a curious pikachu"

For quantized transformers-style transformer component folders:

sglang generate \
  --model-path /path/to/base-model \
  --transformer-path /path/to/quantized-transformer \
  --prompt "A Logo With Bold Large Text: SGL Diffusion"

NOTE: Some model-specific integrations also accept a quantized repo or local directory directly as --model-path, but that is a compatibility path. If a repo contains multiple candidate checkpoints, pass --transformer-weights-path explicitly.

Quant Families#

Here, quant_family means a checkpoint and loading family with shared CLI usage and loader behavior. It is not just the numeric precision or a kernel backend.

quant_family

checkpoint form

canonical CLI

supported models

extra dependency

platform / notes

fp8

Quantized transformer component folder, or safetensors with quantization_config metadata

--transformer-path or --transformer-weights-path

ALL

None

Component-folder and single-file flows are both supported

nvfp4-modelopt

NVFP4 safetensors file, sharded directory, or repo providing transformer weights

--transformer-weights-path

FLUX.2

comfy-kitchen optional on Blackwell

Blackwell can use a best-performance kit when available; otherwise SGLang falls back to the generic ModelOpt FP4 path

nunchaku-svdq

Pre-quantized Nunchaku transformer weights, usually named svdq-{int4|fp4}_r{rank}-...

--transformer-weights-path

Model-specific support such as Qwen-Image, FLUX, and Z-Image

nunchaku

SGLang can infer precision and rank from the filename and supports both int4 and nvfp4

NVFP4#

Usage Examples#

Recommended usage keeps the base model and quantized transformer override separate:

sglang generate \
  --model-path black-forest-labs/FLUX.2-dev \
  --transformer-weights-path black-forest-labs/FLUX.2-dev-NVFP4 \
  --prompt "A Logo With Bold Large Text: SGL Diffusion" \
  --save-output

SGLang also supports passing the NVFP4 repo or local directory directly as --model-path:

sglang generate \
  --model-path black-forest-labs/FLUX.2-dev-NVFP4 \
  --prompt "A Logo With Bold Large Text: SGL Diffusion" \
  --save-output

Notes#

  • --transformer-weights-path is still the canonical CLI for NVFP4 transformer checkpoints.

  • Direct --model-path loading is a compatibility path for FLUX.2 NVFP4-style repos or local directories.

  • If --transformer-weights-path is provided explicitly, it takes precedence over the compatibility --model-path flow.

  • For local directories, SGLang first looks for *-mixed.safetensors, then falls back to loading from the directory.

  • On Blackwell, comfy-kitchen can provide the best-performance path when available; otherwise SGLang falls back to the generic ModelOpt FP4 path.

Nunchaku (SVDQuant)#

Install#

Install the runtime dependency first:

pip install nunchaku

For platform-specific installation methods and troubleshooting, see the Nunchaku installation guide.

File Naming and Auto-Detection#

For Nunchaku checkpoints, --model-path should still point to the original base model, while --transformer-weights-path points to the quantized transformer weights.

If the basename of --transformer-weights-path contains the pattern svdq-(int4|fp4)_r{rank}, SGLang will automatically:

  • enable SVDQuant

  • infer --quantization-precision

  • infer --quantization-rank

Examples:

checkpoint name fragment

inferred precision

inferred rank

notes

svdq-int4_r32

int4

32

Standard INT4 checkpoint

svdq-int4_r128

int4

128

Higher-quality INT4 checkpoint

svdq-fp4_r32

nvfp4

32

fp4 in the filename maps to CLI value nvfp4

svdq-fp4_r128

nvfp4

128

Higher-quality NVFP4 checkpoint

Common filenames:

filename

precision

rank

typical use

svdq-int4_r32-qwen-image.safetensors

int4

32

Balanced default

svdq-int4_r128-qwen-image.safetensors

int4

128

Quality-focused

svdq-fp4_r32-qwen-image.safetensors

nvfp4

32

RTX 50-series / NVFP4 path

svdq-fp4_r128-qwen-image.safetensors

nvfp4

128

Quality-focused NVFP4

svdq-int4_r32-qwen-image-lightningv1.0-4steps.safetensors

int4

32

Lightning 4-step

svdq-int4_r128-qwen-image-lightningv1.1-8steps.safetensors

int4

128

Lightning 8-step

If your checkpoint name does not follow this convention, pass --enable-svdquant, --quantization-precision, and --quantization-rank explicitly.

Usage Examples#

Recommended auto-detected flow:

sglang generate \
  --model-path Qwen/Qwen-Image \
  --transformer-weights-path /path/to/svdq-int4_r32-qwen-image.safetensors \
  --prompt "a beautiful sunset" \
  --save-output

Manual override when the filename does not encode the quant settings:

sglang generate \
  --model-path Qwen/Qwen-Image \
  --transformer-weights-path /path/to/custom_nunchaku_checkpoint.safetensors \
  --enable-svdquant \
  --quantization-precision int4 \
  --quantization-rank 128 \
  --prompt "a beautiful sunset" \
  --save-output

Notes#

  • --transformer-weights-path is the canonical flag for Nunchaku checkpoints. Older config names such as quantized_model_path are treated as compatibility aliases.

  • Auto-detection only happens when the checkpoint basename matches svdq-(int4|fp4)_r{rank}.

  • The CLI values are int4 and nvfp4. In filenames, the NVFP4 variant is written as fp4.

  • Lightning checkpoints usually expect matching --num-inference-steps, such as 4 or 8.

  • Current runtime validation only allows Nunchaku on NVIDIA CUDA Ampere (SM8x) or SM12x GPUs. Hopper (SM90) is currently rejected.