Overview
| Optimization | Type | Description |
|---|---|---|
| Cache-DiT | Caching | Block-level caching with DBCache, TaylorSeer, and SCM |
| TeaCache | Caching | Timestep-level caching based on temporal similarity |
| Attention Backends | Kernel | Optimized attention implementations (FlashAttention, SageAttention, etc.) |
| Profiling | Diagnostics | PyTorch Profiler and Nsight Systems guidance |
Start Here
- Use Attention Backends to choose the best backend for your model and hardware.
- Use Caching Acceleration to reduce denoising cost with Cache-DiT or TeaCache.
- Use Profiling when you need to diagnose a bottleneck rather than guess.
Caching at a Glance
- Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
- TeaCache is timestep-level caching built into SGLang model families.
