Performance#

This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.

Overview#

Optimization

Type

Description

Cache-DiT

Caching

Block-level caching with DBCache, TaylorSeer, and SCM

TeaCache

Caching

Timestep-level caching based on temporal similarity

Attention Backends

Kernel

Optimized attention implementations (FlashAttention, SageAttention, etc.)

Profiling

Diagnostics

PyTorch Profiler and Nsight Systems guidance

Start Here#

Caching at a Glance#

  • Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.

  • TeaCache is timestep-level caching built into SGLang model families.

Current Baseline Snapshot#

For Ring SP benchmark details, see:

References#