Frame Interpolation (video only)
Frame interpolation synthesizes new frames between each pair of consecutive generated frames, producing smoother motion without re-running the diffusion model. The--frame-interpolation-exp flag controls how many rounds of interpolation
to apply: each round inserts one new frame into every gap between adjacent
frames, so the output frame count follows the formula:
(N − 1) × 2^exp + 1 e.g. 5 original frames withexp=1→ 4 gaps × 1 new frame + 5 originals = 9 frames; withexp=2→ 17 frames.
CLI Arguments
| Argument | Description |
|---|---|
—enable-frame-interpolation | Enable frame interpolation. Model weights are downloaded automatically on first use. |
—frame-interpolation-exp {EXP} | Interpolation exponent — 1 = 2× temporal resolution, 2 = 4×, etc. (default: 1) |
—frame-interpolation-scale {SCALE} | RIFE inference scale; use 0.5 for high-resolution inputs to save memory (default: 1.0) |
—frame-interpolation-model-path {PATH} | Local directory or HuggingFace repo ID containing RIFE flownet.pkl weights (default: elfgum/RIFE-4.22.lite, downloaded automatically) |
Supported Models
Frame interpolation uses the RIFE (Real-Time Intermediate Flow Estimation) architecture. Only RIFE 4.22.lite (IFNet with 4-scale IFBlock backbone) is supported. The network topology is
hard-coded, so custom weights provided via --frame-interpolation-model-path
must be a flownet.pkl checkpoint that is compatible with this architecture.
Other RIFE versions (e.g., older v4.x variants with different block counts)
or entirely different frame interpolation methods (FILM, AMT, etc.) are not
supported.
| Weight | HuggingFace Repo | Description |
|---|---|---|
| RIFE 4.22.lite (default) | elfgum/RIFE-4.22.lite | Lightweight model, downloaded automatically on first use |
Example
Generate a 5-frame video and interpolate to 9 frames ((5 − 1) × 2¹ + 1 = 9):Upscaling (image and video)
Upscaling increases the spatial resolution of generated images or video frames using Real-ESRGAN. The model weights are downloaded automatically on first use and cached for subsequent runs.CLI Arguments
| Argument | Description |
|---|---|
—enable-upscaling | Enable post-generation upscaling using Real-ESRGAN. |
—upscaling-scale {SCALE} | Desired upscaling factor (default: 4). The 4× model is used internally; if a different scale is requested, a bicubic resize is applied after the network output. |
—upscaling-model-path {PATH} | Local .pth file, HuggingFace repo ID, or repo_id:filename for Real-ESRGAN weights (default: ai-forever/Real-ESRGAN with RealESRGAN_x4.pth, downloaded automatically). Use the repo_id:filename format to specify a custom weight file from a HuggingFace repo (e.g. my-org/my-esrgan:weights.pth). |
Supported Models
Upscaling supports two Real-ESRGAN network architectures. The correct architecture is auto-detected from the checkpoint keys, so you only need to point--upscaling-model-path at a valid .pth file:
| Architecture | Example Weights | Description |
|---|---|---|
| RRDBNet | RealESRGAN_x4plus.pth | Heavier model with higher quality; best for photos |
| SRVGGNetCompact | RealESRGAN_x4.pth (default), realesr-animevideov3.pth, realesr-general-x4v3.pth | Lightweight model; faster inference, good for video |
ai-forever/Real-ESRGAN with
RealESRGAN_x4.pth (SRVGGNetCompact, 4× native scale).
Other super-resolution models (e.g., SwinIR, HAT, BSRGAN) are not supported
— only Real-ESRGAN checkpoints using the two architectures above are
compatible.
