SGLang Diffusion#
SGLang Diffusion is a high-performance inference framework for image and video generation. It provides native SGLang pipelines, diffusers backend support, an OpenAI-compatible server, and an optimized kernel stack built on both precompiled sgl-kernel operators and JIT kernels for key inference paths.
Key Features#
Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more
Fast inference with
sgl-kernel, JIT kernels, scheduler improvements, and caching accelerationMultiple interfaces:
sglang generate,sglang serve, and an OpenAI-compatible APIMulti-platform support for NVIDIA, AMD, Ascend, Apple Silicon, and Moore Threads
Quick Start#
uv pip install "sglang[diffusion]" --prerelease=allow
sglang generate --model-path Qwen/Qwen-Image \
--prompt "A beautiful sunset over the mountains" \
--save-output
sglang serve --model-path Qwen/Qwen-Image --port 30010
Start Here#
Installation: install SGLang Diffusion and platform dependencies
Compatibility Matrix: check model and optimization support
CLI: run one-off generation jobs or launch a persistent server
OpenAI-Compatible API: send image and video requests to the HTTP server
Attention Backends: choose the best backend for your model and hardware
Caching Acceleration: use Cache-DiT or TeaCache to reduce denoising cost
Quantization: load quantized transformer checkpoints
Contributing: contribution workflow, adding new models, and CI perf baselines
Additional Documentation#
Post-Processing: frame interpolation and upscaling
Performance Overview: overview of attention, caching, and profiling
Environment Variables: platform, caching, storage, and debugging configuration
Support New Models: implementation guide for new diffusion pipelines
CI Performance: performance baseline generation