1. Model Introduction
Wan2.1 series is an open and advanced suite of large-scale video generative models from Wan-AI. Key characteristics:- State-of-the-art video quality: Consistently outperforms many open-source and commercial video models on internal and public benchmarks, especially for motion richness and temporal consistency.
- Consumer GPU friendly: The T2V-1.3B variant can generate 5-second 480P videos on consumer GPUs with modest VRAM requirements.
- Multi-capability suite: Supports Text-to-Video (T2V), Image-to-Video (I2V), video editing, text-to-image, and video-to-audio generation.
- Robust text rendering: First-generation Wan model capable of generating both Chinese and English text in videos with strong readability.
- Powerful Wan-VAE: A 3D causal VAE that encodes/decodes long 1080P videos while preserving temporal information, enabling efficient high-resolution video generation.
- GitHub: Wan-Video/Wan2.1
- Hugging Face collection: Wan-AI Wan2.1
2. SGLang-diffusion Installation
SGLang-diffusion offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang-diffusion installation guide for installation instructions.3. Model Deployment
This section provides deployment configurations optimized for different hardware platforms and use cases.3.1 Basic Configuration
The Wan2.1 series offers models in multiple sizes and resolutions, optimized for different hardware platforms. The recommended launch configurations vary by hardware and model size. Interactive Command Generator: Use the configuration selector below to automatically generate an appropriate deployment command for your model variant and options.3.2 Configuration Tips
Current supported optimization options are listed in the SGLang diffusion support matrix.--vae-path: Path to a custom VAE model or HuggingFace model ID. If not specified, the VAE will be loaded from the main model path.--num-gpus {NUM_GPUS}: Number of GPUs to use.--tp-size {TP_SIZE}: Tensor parallelism size (for the encoder/DiT; keep (\leq 1) if relying heavily on CPU offload).--sp-degree {SP_SIZE}: Sequence parallelism degree.--ulysses-degree {ULYSSES_DEGREE}: Degree of DeepSpeed-Ulysses-style SP in USP.--ring-degree {RING_DEGREE}: Degree of ring attention-style SP in USP.--text-encoder-cpu-offload,--dit-cpu-offload,--vae-cpu-offload: Use CPU offload to reduce peak GPU memory when needed.
4. Model Invocation
4.1 Basic Usage
For more API usage and request examples, please refer to: SGLang Diffusion OpenAI API4.1.1 Launch a server and then send requests
Command
4.1.2 Generate a video without launching a server
Command
4.2 Advanced Usage
4.2.1 Cache-DiT Acceleration
SGLang integrates Cache-DiT, a caching acceleration engine for Diffusion Transformers (DiT), to achieve significant inference speedups with minimal quality loss. You can setSGLANG_CACHE_DIT_ENABLED=True to enable it. For more details, please refer to the SGLang Cache-DiT documentation.
Basic Usage
Command
Command
4.2.2 GPU Optimization
--dit-cpu-offload: Use CPU offload for DiT inference. Enable if you run out of memory with FSDP.--text-encoder-cpu-offload: Use CPU offload for text encoder inference.--image-encoder-cpu-offload: Use CPU offload for image encoder inference.--vae-cpu-offload: Use CPU offload for VAE.--pin-cpu-memory: Pin memory for CPU offload. Use as a workaround if you see “CUDA error: invalid argument”.
4.2.3 Supported LoRA Registry
SGLang supports applying Wan2.1 LoRA adapters on top of base models:| origin model | supported LoRA |
|---|---|
| Wan-AI/Wan2.1-T2V-14B | NIVEDAN/wan2.1-lora |
| Wan-AI/Wan2.1-I2V-14B-720P | valiantcat/Wan2.1-Fight-LoRA |
Command
5. Benchmark
Test Environment:- Hardware: AMD MI300X GPU (1x)
- Model: Wan-AI/Wan2.1-T2V-14B-Diffusers
- SGLang Docker Image Version: 0.5.9
5.1 How to Run Benchmarks with SGLang
You can use the built-in SGLang diffusion benchmark script to evaluate Wan2.1 performance on your hardware.5.1.1 Generate a single video
Server Command:Command
Command
Output
5.1.2 Generate videos with Cache-DiT acceleration
Server Command:Command
Command
Output
