Supported Models and Optimization Compatibility

This page tracks supported SGLang Diffusion model families and their optimization compatibility. It also covers long-tail models that do not yet have dedicated cookbook recipes. For model-specific usage recipes, start from the Diffusion Cookbook. Cookbook pages cover the primary models with examples; this page keeps the compact support and compatibility inventory.

Supported model inventory

Pass the Hugging Face Model ID to --model-path for sglang generate or sglang serve. Python API users can pass the same ID to SGLang Diffusion model-loading helpers. Missing checkpoint aliases do not imply that a model family is unsupported. The runtime registry may also accept detector-based aliases or local model directories that match the same family. Rows are grouped when a family shares the same runtime path or optimization support. Use the detailed matrix below when you need per-optimization compatibility.

Image
Video
Realtime / World

Model family	Model IDs
FLUX	`black-forest-labs/FLUX.1-devblack-forest-labs/FLUX.2-devblack-forest-labs/FLUX.2-dev-NVFP4black-forest-labs/FLUX.2-klein-4Bblack-forest-labs/FLUX.2-klein-9Bblack-forest-labs/FLUX.2-klein-base-4Bblack-forest-labs/FLUX.2-klein-base-9B`
Z-Image	`Tongyi-MAI/Z-ImageTongyi-MAI/Z-Image-Turbo`
Qwen-Image	`Qwen/Qwen-ImageQwen/Qwen-Image-2512Qwen/Qwen-Image-EditQwen/Qwen-Image-Edit-2509Qwen/Qwen-Image-Edit-2511Qwen/Qwen-Image-Layered`
SD3 / SD3.5	`stabilityai/stable-diffusion-3-mediumstabilityai/stable-diffusion-3-medium-diffusersstabilityai/stable-diffusion-3.5-mediumstabilityai/stable-diffusion-3.5-medium-diffusersstabilityai/stable-diffusion-3.5-largestabilityai/stable-diffusion-3.5-large-diffusers`
SANA	`Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusersEfficient-Large-Model/SANA1.5_4.8B_1024px_diffusersEfficient-Large-Model/Sana_1600M_1024px_diffusersEfficient-Large-Model/Sana_600M_1024px_diffusersEfficient-Large-Model/Sana_1600M_512px_diffusersEfficient-Large-Model/Sana_600M_512px_diffusers`
FireRed-Image	`FireRedTeam/FireRed-Image-Edit-1.0FireRedTeam/FireRed-Image-Edit-1.1`
JoyAI-Image	`jdopensource/JoyAI-Image-Edit-Diffusers`
Other image pipelines	`zai-org/GLM-Imagetencent/Hunyuan3D-2baidu/ERNIE-Imagebaidu/ERNIE-Image-Turboideogram-ai/ideogram-4-fp8ideogram-ai/ideogram-4-nf4Comfy-Org/Ideogram-4fal/ideogram-v4-fastfal/ideogram-v4-instant`

Model family	Model IDs	Resolution / mode	Optimization support
FastWan	`FastVideo/FastWan2.1-T2V-1.3B-DiffusersFastVideo/FastWan2.2-TI2V-5B-FullAttn-DiffusersFastVideo/FastWan2.2-TI2V-5B-Diffusers`	480p / 720p	VSA
Wan2.2	`Wan-AI/Wan2.2-TI2V-5B-DiffusersWan-AI/Wan2.2-T2V-A14B-Diffusersnvidia/Wan2.2-T2V-A14B-Diffusers-NVFP4Wan-AI/Wan2.2-I2V-A14B-Diffusers`	TI2V / T2V / I2V, 480p / 720p	SageLaserBSARain Fusion
LongLive 2.0	`Rabinovich/LongLive-2.0-5B-Diffusers`	T2V / I2V, 480p / 720p	No dedicated optimization listed
HunyuanVideo	`hunyuanvideo-community/HunyuanVideoFastVideo/FastHunyuan-diffusers`	720×1280 / 544×960	TileSageSVG2
Wan2.1	`Wan-AI/Wan2.1-T2V-1.3B-DiffusersWan-AI/Wan2.1-T2V-14B-DiffusersWan-AI/Wan2.1-I2V-14B-480P-DiffusersWan-AI/Wan2.1-I2V-14B-720P-Diffusers`	T2V / I2V, 480p / 720p	TeaCacheTileSageSVG2LaserBSARain Fusion
TurboWan	`IPostYellow/TurboWan2.1-T2V-1.3B-DiffusersIPostYellow/TurboWan2.1-T2V-14B-DiffusersIPostYellow/TurboWan2.1-T2V-14B-720P-DiffusersIPostYellow/TurboWan2.2-I2V-A14B-Diffusers`	480p / 720p	TeaCacheSLASageSLA
MOVA	`OpenMOSS-Team/MOVA-360pOpenMOSS-Team/MOVA-720p`	Video-audio, 360p / 720p; local MOVA detector aliases are also supported.	No dedicated optimization listed
Wan2.1 Fun	`weizhou03/Wan2.1-Fun-1.3B-InP-Diffusers`	480p inpainting	TeaCacheTileSageSVG2
Helios	`BestWishYsh/Helios-BaseBestWishYsh/Helios-MidBestWishYsh/Helios-Distilled`	720p	No dedicated optimization listed
LTX-2	`Lightricks/LTX-2Lightricks/LTX-2.3`	One-stage, two-stage, TI2V, HQ	No dedicated optimization listed
Cosmos3	`nvidia/Cosmos3-Nanonvidia/Cosmos3-Supernvidia/Cosmos3-Super-Text2Imagenvidia/Cosmos3-Super-Image2Video`	T2V / I2V / T2I	No dedicated optimization listed

Model family	Model IDs / detector	Notes
LingBotWorld	`robbyant/lingbot-world-fast-diffusers`	Realtime world model with causal state and control tokens.
SANA-WM	`Efficient-Large-Model/SANA-WM_bidirectionalEfficient-Large-Model/SANA-WM_streaming`	World-model pipeline with bidirectional and streaming checkpoints.

Wan2.2 TI2V 5B currently has known quality issues when used for I2V generation.

Optimization compatibility

The detailed video matrix uses these symbols:

✅ = Full compatibility
❌ = No compatibility
⭕ = Does not apply to this model

Detailed video optimization matrix

Video Generation Models

Optimization columns are abbreviated to keep the matrix readable:

Tea = TeaCache
Tile = Sliding Tile Attention
Sage = Sage Attention
VSA = Video Sparse Attention
SLA = Sparse Linear Attention
SageSLA = Sage Sparse Linear Attention
SVG2 = Sparse Video Gen 2
LA = Laser Attention
BSA = Block Sparse Attention
RF = Rain Fusion Attention

Model Name	Hugging Face Model ID	Resolution	Tea	Tile	Sage	VSA	SLA	SageSLA	SVG2	LA	BSA	RF
FastWan2.1 T2V 1.3B	`FastVideo/FastWan2.1-T2V-1.3B-Diffusers`	480p	⭕	⭕	⭕	✅	❌	❌	❌	❌	❌	❌
FastWan2.2 TI2V 5B	`FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers` `FastVideo/FastWan2.2-TI2V-5B-Diffusers`	720p	⭕	⭕	⭕	✅	❌	❌	❌	❌	❌	❌
Wan2.2 TI2V 5B	`Wan-AI/Wan2.2-TI2V-5B-Diffusers`	720p	⭕	⭕	✅	⭕	❌	❌	❌	✅	✅	✅
LongLive 2.0 5B	`Rabinovich/LongLive-2.0-5B-Diffusers`	480p 720p	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
Wan2.2 T2V A14B	`Wan-AI/Wan2.2-T2V-A14B-Diffusers` `nvidia/Wan2.2-T2V-A14B-Diffusers-NVFP4`	480p 720p	❌	❌	✅	⭕	❌	❌	❌	✅	✅	✅
Wan2.2 I2V A14B	`Wan-AI/Wan2.2-I2V-A14B-Diffusers`	480p 720p	❌	❌	✅	⭕	❌	❌	❌	✅	✅	✅
HunyuanVideo	`hunyuanvideo-community/HunyuanVideo`	720×1280 544×960	❌	✅	✅	⭕	❌	❌	✅	❌	❌	❌
FastHunyuan	`FastVideo/FastHunyuan-diffusers`	720×1280 544×960	❌	✅	✅	⭕	❌	❌	✅	❌	❌	❌
Wan2.1 T2V 1.3B	`Wan-AI/Wan2.1-T2V-1.3B-Diffusers`	480p	✅	✅	✅	⭕	❌	❌	✅	✅	✅	✅
Wan2.1 T2V 14B	`Wan-AI/Wan2.1-T2V-14B-Diffusers`	480p, 720p	✅	✅	✅	⭕	❌	❌	✅	✅	✅	✅
Wan2.1 I2V 480P	`Wan-AI/Wan2.1-I2V-14B-480P-Diffusers`	480p	✅	✅	✅	⭕	❌	❌	✅	✅	✅	✅
Wan2.1 I2V 720P	`Wan-AI/Wan2.1-I2V-14B-720P-Diffusers`	720p	✅	✅	✅	⭕	❌	❌	✅	✅	✅	✅
TurboWan2.1 T2V 1.3B	`IPostYellow/TurboWan2.1-T2V-1.3B-Diffusers`	480p	✅	❌	❌	❌	✅	✅	⭕	❌	❌	❌
TurboWan2.1 T2V 14B	`IPostYellow/TurboWan2.1-T2V-14B-Diffusers`	480p	✅	❌	❌	❌	✅	✅	⭕	❌	❌	❌
TurboWan2.1 T2V 14B 720P	`IPostYellow/TurboWan2.1-T2V-14B-720P-Diffusers`	720p	✅	❌	❌	❌	✅	✅	⭕	❌	❌	❌
TurboWan2.2 I2V A14B	`IPostYellow/TurboWan2.2-I2V-A14B-Diffusers`	720p	✅	❌	❌	❌	✅	✅	⭕	❌	❌	❌
Wan2.1 Fun 1.3B InP	`weizhou03/Wan2.1-Fun-1.3B-InP-Diffusers`	480p	✅	✅	✅	⭕	❌	❌	✅	❌	❌	❌
Helios Base	`BestWishYsh/Helios-Base`	720p	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
Helios Mid	`BestWishYsh/Helios-Mid`	720p	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
Helios Distilled	`BestWishYsh/Helios-Distilled`	720p	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
LTX-2 (one/two-stage/TI2V)	`Lightricks/LTX-2`	768×512 1536×1024	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
LTX-2.3 (one/two-stage/TI2V/HQ)	`Lightricks/LTX-2.3`	768×512 1536×1024 1920×1088 (HQ default)	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
Cosmos3-Nano (T2V / I2V / T2I)	`nvidia/Cosmos3-Nano`	720p · 480p 1024×1024 (T2I)	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
Cosmos3-Super (T2V / I2V / T2I)	`nvidia/Cosmos3-Super` `nvidia/Cosmos3-Super-Text2Image` `nvidia/Cosmos3-Super-Image2Video`	720p · 480p 1024×1024 (T2I)	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌

Note:

Wan2.2 TI2V 5B has some quality issues when performing I2V generation. We are working on fixing this issue.
SageSLA is based on SpargeAttn. Install it first with pip install git+https://github.com/thu-ml/SpargeAttn.git --no-build-isolation
LTX pipeline selection:
- One-stage: --pipeline-class-name LTX2Pipeline
- Two-stage: --pipeline-class-name LTX2TwoStagePipeline
- Two-stage HQ: --pipeline-class-name LTX2TwoStageHQPipeline (HQ defaults to 1920×1088; you can still override --width/--height)
- LTX-2 and LTX-2.3 support both T2V and TI2V (--image-path) on one-stage and two-stage pipelines (including HQ).
- The spatial upsampler and distilled LoRA are auto-resolved from the model snapshot by default, and can still be overridden with --spatial-upsampler-path and --distilled-lora-path.
- For LTX models, the Resolutions column uses output video width×height semantics, matching sglang generate --width ... --height ....
LTX-2 / LTX-2.3 two-stage also supports --ltx2-two-stage-device-mode {original,resident}:
- original keeps official two-stage semantics without the premerged stage-2 transformer path.
- resident usually provides the best latency/throughput but uses much more VRAM.
- Default is auto: resident on H200/high-memory CUDA GPUs, otherwise original.
- Deprecated compatibility: snapshot is accepted as an alias for original and may be removed after two release cycles.
Cosmos3 ships in two sizes — nvidia/Cosmos3-Nano (8B) and nvidia/Cosmos3-Super (32B). Both share the same pipeline; the only difference is transformer depth and width, picked up from transformer/config.json at load time. A single checkpoint serves T2V, I2V (--image-path), and T2I (--num-frames 1).

Supported Components

SGLang Diffusion supports overriding individual pipeline components with --<component>-path. The value can be either a Hugging Face repo ID or a local component directory. The same overrides can also be provided in config files through component_paths.<component>.

Common Syntax

CLI:

Command

sglang generate \
  --model-path black-forest-labs/FLUX.2-dev \
  --vae-path black-forest-labs/FLUX.2-small-decoder \
  --transformer-path /models/flux2/transformer

Config file:

Config

model_path: black-forest-labs/FLUX.2-dev
component_paths:
  vae: black-forest-labs/FLUX.2-small-decoder
  transformer: /models/flux2/transformer

Use the component name from the pipeline’s model_index.json or the native pipeline’s registered module name:

Component Type	Supported Keys	Notes
VAE	`vae`, `video_vae`, `audio_vae`	`vae` is the common image-generation override
Transformer / DiT	`transformer`, `video_dit`, `audio_dit`	`transformer` is the standard override for the main denoiser
Text / Preprocess	`text_encoder`, `text_encoder_2`, `tokenizer`, `processor`, `image_processor`	Replacement encoders often need matching preprocessing assets
Auxiliary	`scheduler`, `spatial_upsampler`, `vocoder`, `connectors`, `dual_tower_bridge`, `image_encoder`, `vision_language_encoder`	Only valid for pipelines that expose these components

Known Component Repos

The table below lists concrete Hugging Face component repos that are already used in SGLang Diffusion docs or tests. It is not an exhaustive catalog of all compatible component repos.

Base Model	Override Key	Example Repo	Notes
`black-forest-labs/FLUX.2-dev`	`vae`	`black-forest-labs/FLUX.2-small-decoder`	Decoder-only FLUX.2 VAE override
`black-forest-labs/FLUX.2-dev`	`vae`	`fal/FLUX.2-Tiny-AutoEncoder`	Existing tested custom VAE path

VAE

--vae-path is the common image-generation override.
--video-vae-path and --audio-vae-path are only relevant for pipelines with separate video or audio VAEs.

Transformer / DiT

--transformer-path is the standard override for the main denoising transformer.
For quantized transformers, prefer --transformer-path or --transformer-weights-path; see quantization.md.
--video-dit-path and --audio-dit-path are only for pipelines that split denoisers by modality.

Text Encoders and Preprocessors

--text-encoder-path and --text-encoder-2-path override primary and secondary text encoders.
--tokenizer-path, --processor-path, and --image-processor-path are useful when the replacement encoder requires matching preprocessing assets.

Auxiliary Components

--scheduler-path is only relevant when the pipeline exposes a scheduler component.
--spatial-upsampler-path is mainly for two-stage pipelines such as LTX2TwoStagePipeline.
--vocoder-path, --connectors-path, --dual-tower-bridge-path, --image-encoder-path, and --vision-language-encoder-path are only valid for pipelines that expose those components.

Notes

Component overrides are only valid when the target pipeline actually uses that component.
The override key should match the component name in the pipeline’s model_index.json or the native pipeline’s registered module name.

Verified LoRA Examples

This section lists example LoRAs that have been explicitly tested and verified with each base model in the SGLang Diffusion pipeline.

LoRAs that are not listed here are not necessarily incompatible. In practice, most standard LoRAs are expected to work, especially those following common Diffusers or SD-style conventions. The entries below simply reflect configurations that have been manually validated by the SGLang team.

Verified LoRAs by Base Model

Base Model	Supported LoRAs
Wan2.2	`lightx2v/Wan2.2-Distill-Loras` `Cseti/wan2.2-14B-Arcane_Jinx-lora-v1`
Wan2.1	`lightx2v/Wan2.1-Distill-Loras`
Z-Image-Turbo	`tarn59/pixel_art_style_lora_z_image_turbo` `wcde/Z-Image-Turbo-DeJPEG-Lora`
Qwen-Image	`lightx2v/Qwen-Image-Lightning` `flymy-ai/qwen-image-realism-lora` `prithivMLmods/Qwen-Image-HeadshotX` `starsfriday/Qwen-Image-EVA-LoRA`
Qwen-Image-Edit	`ostris/qwen_image_edit_inpainting` `lightx2v/Qwen-Image-Edit-2511-Lightning`
Flux	`dvyio/flux-lora-simple-illustration` `XLabs-AI/flux-furry-lora` `XLabs-AI/flux-RealismLora`

Special requirements

Sliding Tile Attention

Currently, only Hopper GPUs (H100s) are supported.

​Supported model inventory

​Optimization compatibility

​Video Generation Models

​Supported Components

​Common Syntax

​Known Component Repos

​VAE

​Transformer / DiT

​Text Encoders and Preprocessors

​Auxiliary Components

​Notes

​Verified LoRA Examples

​Verified LoRAs by Base Model

​Special requirements

​Sliding Tile Attention

Supported model inventory

Optimization compatibility

Video Generation Models

Supported Components

Common Syntax

Known Component Repos

VAE

Transformer / DiT

Text Encoders and Preprocessors

Auxiliary Components

Notes

Verified LoRA Examples

Verified LoRAs by Base Model

Special requirements

Sliding Tile Attention