sgl-kernel operators and JIT kernels for key inference paths.
Key Features
- Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more
- Fast inference with
sgl-kernel, JIT kernels, scheduler improvements, and caching acceleration - Multiple interfaces:
sglang generate,sglang serve, and an OpenAI-compatible API - Multi-platform support for NVIDIA, AMD, Intel XPU, Ascend, Apple Silicon, and Moore Threads
Quick Start
Start Here
- Installation: install SGLang Diffusion and platform dependencies
- Compatibility Matrix: check model, optimization, and component override support
- CLI: run one-off generation jobs or launch a persistent server
- OpenAI-Compatible API: send image and video requests to the HTTP server
- Attention Backends: choose the best backend for your model and hardware
- Caching Acceleration: use Cache-DiT or TeaCache to reduce denoising cost
- Quantization: load quantized transformer checkpoints
- Contributing: contribution workflow, adding new models, and CI perf baselines
Additional Documentation
- Post-Processing: frame interpolation and upscaling
- Performance Overview: overview of attention, caching, and profiling
- Environment Variables: platform, caching, storage, and debugging configuration
- Support New Models: implementation guide for new diffusion pipelines
- CI Performance: performance baseline generation
