Performance#
This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.
Overview#
Optimization |
Type |
Description |
|---|---|---|
Cache-DiT |
Caching |
Block-level caching with DBCache, TaylorSeer, and SCM |
TeaCache |
Caching |
Timestep-level caching based on temporal similarity |
Attention Backends |
Kernel |
Optimized attention implementations (FlashAttention, SageAttention, etc.) |
Profiling |
Diagnostics |
PyTorch Profiler and Nsight Systems guidance |
Start Here#
Use Attention Backends to choose the best backend for your model and hardware.
Use Caching Acceleration to reduce denoising cost with Cache-DiT or TeaCache.
Use Profiling when you need to diagnose a bottleneck rather than guess.
Caching at a Glance#
Current Baseline Snapshot#
For Ring SP benchmark details, see: