SGLang Diffusion#

SGLang Diffusion is a high-performance inference framework for image and video generation. It provides native SGLang pipelines, diffusers backend support, an OpenAI-compatible server, and an optimized kernel stack built on both precompiled sgl-kernel operators and JIT kernels for key inference paths.

Key Features#

  • Broad model support across Wan, Hunyuan, Qwen-Image, FLUX, Z-Image, GLM-Image, and more

  • Fast inference with sgl-kernel, JIT kernels, scheduler improvements, and caching acceleration

  • Multiple interfaces: sglang generate, sglang serve, and an OpenAI-compatible API

  • Multi-platform support for NVIDIA, AMD, Ascend, Apple Silicon, and Moore Threads

Quick Start#

uv pip install "sglang[diffusion]" --prerelease=allow
sglang generate --model-path Qwen/Qwen-Image \
  --prompt "A beautiful sunset over the mountains" \
  --save-output
sglang serve --model-path Qwen/Qwen-Image --port 30010

Start Here#

Additional Documentation#

References#