TeaCache Acceleration#

Note: This is one of two caching strategies available in SGLang. For an overview of all caching options, see caching.

TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely.

Overview#

TeaCache works by:

Tracking the L1 distance between modulated inputs across consecutive timesteps
Accumulating the rescaled L1 distance over steps
When accumulated distance is below a threshold, reusing the cached residual
Supporting CFG (Classifier-Free Guidance) with separate positive/negative caches

How It Works#

L1 Distance Tracking#

At each denoising step, TeaCache computes the relative L1 distance between the current and previous modulated inputs:

rel_l1 = |current - previous|.mean() / |previous|.mean()

This distance is then rescaled using polynomial coefficients and accumulated:

accumulated += poly(coefficients)(rel_l1)

Cache Decision#

If accumulated >= threshold: Force computation, reset accumulator
If accumulated < threshold: Skip computation, use cached residual

CFG Support#

For models that support CFG cache separation (Wan, Hunyuan, Z-Image), TeaCache maintains separate caches for positive and negative branches:

previous_modulated_input / previous_residual for positive branch
previous_modulated_input_negative / previous_residual_negative for negative branch

For models that don’t support CFG separation (Flux, Qwen), TeaCache is automatically disabled when CFG is enabled.

Configuration#

TeaCache is configured via TeaCacheParams in the sampling parameters:

from sglang.multimodal_gen.configs.sample.teacache import TeaCacheParams

params = TeaCacheParams(
    teacache_thresh=0.1,           # Threshold for accumulated L1 distance
    coefficients=[1.0, 0.0, 0.0],  # Polynomial coefficients for L1 rescaling
)

Parameters#

Parameter	Type	Description
`teacache_thresh`	float	Threshold for accumulated L1 distance. Lower = more caching, faster but potentially lower quality
`coefficients`	list[float]	Polynomial coefficients for L1 rescaling. Model-specific tuning

Model-Specific Configurations#

Different models may have different optimal configurations. The coefficients are typically tuned per-model to balance speed and quality.

Supported Models#

TeaCache is built into the following model families:

Model Family	CFG Cache Separation	Notes
Wan (wan2.1, wan2.2)	Yes	Full support
Hunyuan (HunyuanVideo)	Yes	To be supported
Z-Image	Yes	To be supported
Flux	No	To be supported
Qwen	No	To be supported

References#

TeaCache: Accelerating Diffusion Models with Temporal Similarity

TeaCache Acceleration

Contents