SGLang Diffusion#

SGLang Diffusion is an inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.

Key Features#

  • Broad Model Support: Wan series, FastWan series, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux, Z-Image, GLM-Image, and more

  • Fast Inference: Optimized kernels, efficient scheduler loop, and Cache-DiT acceleration

  • Ease of Use: OpenAI-compatible API, CLI, and Python SDK

  • Multi-Platform: NVIDIA GPUs (H100, H200, A100, B200, 4090) and AMD GPUs (MI300X, MI325X)


Quick Start#

Installation#

uv pip install "sglang[diffusion]" --prerelease=allow

See Installation Guide for more installation methods and ROCm-specific instructions.

Basic Usage#

Generate an image with the CLI:

sglang generate --model-path Qwen/Qwen-Image \
    --prompt "A beautiful sunset over the mountains" \
    --save-output

Or start a server with the OpenAI-compatible API:

sglang serve --model-path Qwen/Qwen-Image --port 30010

Documentation#

Getting Started#

Usage#

  • CLI Documentation - Command-line interface for sglang generate and sglang serve

  • OpenAI API - OpenAI-compatible API for image/video generation and LoRA management

Performance Optimization#

Reference#


CLI Quick Reference#

Generate (one-off generation)#

sglang generate --model-path <MODEL> --prompt "<PROMPT>" --save-output

Serve (HTTP server)#

sglang serve --model-path <MODEL> --port 30010

Enable Cache-DiT acceleration#

SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path <MODEL> --prompt "<PROMPT>"

References#