SGLang Diffusion - SGLang Documentation

SGLang Diffusion is a high-performance inference framework for image and video generation. It provides native SGLang pipelines, diffusers backend support, an OpenAI-compatible server, and an optimized kernel stack built on both precompiled sgl-kernel operators and JIT kernels for key inference paths.

Key Features

Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more
Fast inference with sgl-kernel, JIT kernels, scheduler improvements, and caching acceleration
Multiple interfaces: sglang generate, sglang serve, and an OpenAI-compatible API
Multi-platform support for NVIDIA, AMD, Intel XPU, Ascend, Apple Silicon, and Moore Threads

Quick Start

uv pip install "sglang[diffusion]" --prerelease=allow

sglang generate --model-path Qwen/Qwen-Image \
  --prompt "A beautiful sunset over the mountains" \
  --save-output

sglang serve --model-path Qwen/Qwen-Image --port 30010

Start Here

Installation: install SGLang Diffusion and platform dependencies
Supported Models and Optimization Compatibility: check supported model families, long-tail coverage, and optimization support
CLI: run one-off generation jobs or launch a persistent server
OpenAI-Compatible API: send image and video requests to the HTTP server
Performance Overview: choose speed, memory, parallelism, caching, and quality-tradeoff levers
Caching Acceleration: use Cache-DiT or TeaCache to reduce denoising cost
Quantization: load quantized transformer checkpoints
Contributing: contribution workflow, adding new models, and CI perf baselines

Additional Documentation

Post-Processing: frame interpolation and upscaling
Deployment and Performance Modes: choose --performance-mode, offload, FSDP, CFG parallelism, SP, and TP
Attention Backends: choose the best backend for your model and hardware
Sequence Parallelism: configure SP, Ulysses, and ring-based splitting for long sequences
Inference Batching: batch compatible native diffusion requests during serving
Progressive Resolution Generation: run early denoising steps at lower latent resolution for selected pipelines
Environment Variables: platform, caching, storage, and debugging configuration

Developer Documentation

Support New Models: implementation guide for new diffusion pipelines
CI Performance Baselines: generate and update performance baselines used in CI

​Key Features

​Quick Start

​Start Here

​Additional Documentation

​Developer Documentation

​References

Key Features

Quick Start

Start Here

Additional Documentation

Developer Documentation

References