> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SGLang Diffusion

> Accelerated image and video generation with diffusion models.

SGLang Diffusion is a high-performance inference framework for image and video generation. It provides native SGLang pipelines, diffusers backend support, an OpenAI-compatible server, and an optimized kernel stack built on both precompiled `sgl-kernel` operators and JIT kernels for key inference paths.

## Key Features

* Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more
* Fast inference with `sgl-kernel`, JIT kernels, scheduler improvements, and caching acceleration
* Multiple interfaces: `sglang generate`, `sglang serve`, and an OpenAI-compatible API
* Multi-platform support for NVIDIA, AMD, Intel XPU, Ascend, Apple Silicon, and Moore Threads

## Quick Start

```bash theme={null}
uv pip install "sglang[diffusion]" --prerelease=allow
```

```bash theme={null}
sglang generate --model-path Qwen/Qwen-Image \
  --prompt "A beautiful sunset over the mountains" \
  --save-output
```

```bash theme={null}
sglang serve --model-path Qwen/Qwen-Image --port 30010
```

## Start Here

* [Installation](/docs/sglang-diffusion/installation): install SGLang Diffusion and platform dependencies
* [Supported Models and Optimization Compatibility](/docs/sglang-diffusion/compatibility_matrix): check supported model families, long-tail coverage, and optimization support
* [CLI](/docs/sglang-diffusion/api/cli): run one-off generation jobs or launch a persistent server
* [OpenAI-Compatible API](/docs/sglang-diffusion/api/openai_api): send image and video requests to the HTTP server
* [Performance Overview](/docs/sglang-diffusion/performance-optimization): choose speed, memory, parallelism, caching, and quality-tradeoff levers
* [Caching Acceleration](/docs/sglang-diffusion/caching-acceleration): use Cache-DiT or TeaCache to reduce denoising cost
* [Quantization](/docs/sglang-diffusion/quantization): load quantized transformer checkpoints
* [Contributing](/docs/sglang-diffusion/contributing): contribution workflow, adding new models, and CI perf baselines

## Additional Documentation

* [Post-Processing](/docs/sglang-diffusion/api/post_processing): frame interpolation and upscaling
* [Deployment and Performance Modes](/docs/sglang-diffusion/deployment_cookbook): choose `--performance-mode`, offload, FSDP, CFG parallelism, SP, and TP
* [Attention Backends](/docs/sglang-diffusion/attention_backends): choose the best backend for your model and hardware
* [Sequence Parallelism](/docs/sglang-diffusion/ring_sp_performance): configure SP, Ulysses, and ring-based splitting for long sequences
* [Inference Batching](/docs/sglang-diffusion/dynamic_batching): batch compatible native diffusion requests during serving
* [Progressive Resolution Generation](/docs/sglang-diffusion/progressive_resolution): run early denoising steps at lower latent resolution for selected pipelines
* [Environment Variables](/docs/sglang-diffusion/environment_variables): platform, caching, storage, and debugging configuration

## Developer Documentation

* [Support New Models](/docs/sglang-diffusion/support_new_models): implementation guide for new diffusion pipelines
* [CI Performance Baselines](/docs/sglang-diffusion/ci_perf): generate and update performance baselines used in CI

## References

* [SGLang GitHub](https://github.com/sgl-project/sglang)
* [Cache-DiT](https://github.com/vipshop/cache-dit)
* [FastVideo](https://github.com/hao-ai-lab/FastVideo)
* [xDiT](https://github.com/xdit-project/xDiT)
* [Diffusers](https://github.com/huggingface/diffusers)