sgl-kernel operators and JIT kernels for key inference paths.
Key Features
- Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more
- Fast inference with
sgl-kernel, JIT kernels, scheduler improvements, and caching acceleration - Multiple interfaces:
sglang generate,sglang serve, and an OpenAI-compatible API - Multi-platform support for NVIDIA, AMD, Intel XPU, Ascend, Apple Silicon, and Moore Threads
Quick Start
Start Here
- Installation: install SGLang Diffusion and platform dependencies
- Supported Models and Optimization Compatibility: check supported model families, long-tail coverage, and optimization support
- CLI: run one-off generation jobs or launch a persistent server
- OpenAI-Compatible API: send image and video requests to the HTTP server
- Performance Overview: choose speed, memory, parallelism, caching, and quality-tradeoff levers
- Caching Acceleration: use Cache-DiT or TeaCache to reduce denoising cost
- Quantization: load quantized transformer checkpoints
- Contributing: contribution workflow, adding new models, and CI perf baselines
Additional Documentation
- Post-Processing: frame interpolation and upscaling
- Deployment and Performance Modes: choose
--performance-mode, offload, FSDP, CFG parallelism, SP, and TP - Attention Backends: choose the best backend for your model and hardware
- Sequence Parallelism: configure SP, Ulysses, and ring-based splitting for long sequences
- Inference Batching: batch compatible native diffusion requests during serving
- Progressive Resolution Generation: run early denoising steps at lower latent resolution for selected pipelines
- Environment Variables: platform, caching, storage, and debugging configuration
Developer Documentation
- Support New Models: implementation guide for new diffusion pipelines
- CI Performance Baselines: generate and update performance baselines used in CI
