Skip to main content
This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.

Overview

OptimizationTypeDescription
Cache-DiTCachingBlock-level caching with DBCache, TaylorSeer, and SCM
TeaCacheCachingTimestep-level caching based on temporal similarity
Attention BackendsKernelOptimized attention implementations (FlashAttention, SageAttention, etc.)
ProfilingDiagnosticsPyTorch Profiler and Nsight Systems guidance

Start Here

Caching at a Glance

  • Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
  • TeaCache is timestep-level caching built into SGLang model families.

Current Baseline Snapshot

For Ring SP benchmark details, see:

References