sglang generate or to start a persistent HTTP server with sglang serve.
Overlay repos for non-diffusers models
If--model-path points to a supported non-diffusers source repo, SGLang can resolve it
through a self-hosted overlay repo.
SGLang first checks a built-in overlay registry. Concrete built-in mappings can be added over time without changing the CLI surface.
Override example:
Command
--model-path if it contains _overlay/overlay_manifest.json.
Notes:
SGLANG_DIFFUSION_MODEL_OVERLAY_REGISTRYis only an optional override for development and debugging. It accepts either a JSON object or a path to a JSON file, and can extend or replace built-in entries for the current process.- On the first load, SGLang will:
- download overlay metadata from the overlay repo
- download the required files from the original source repo
- materialize a local standard component repo under
~/.cache/sgl_diffusion/materialized_models/
- Later loads reuse the materialized local repo. The materialized repo is what the runtime loads as a normal componentized model directory.
Quick Start
Generate
Command
Serve
Command
Common Options
Model and runtime
--model-path {MODEL}: model path or Hugging Face model ID--lora-path {PATH}and--lora-nickname {NAME}: load a LoRA adapter--num-gpus {N}: number of GPUs to use--tp-size {N}: tensor parallelism size, mainly for encoders--sp-degree {N}: sequence parallelism size--ulysses-degree {N}and--ring-degree {N}: USP parallelism controls--attention-backend {BACKEND}: attention backend for native SGLang pipelines--attention-backend-config {CONFIG}: attention backend configuration
Sampling and output
--prompt {PROMPT}and--negative-prompt {PROMPT}--image-path {PATH} [{PATH} ...]: input image(s) for image-to-video or image-to-image generation--num-inference-steps {STEPS}and--seed {SEED}--height {HEIGHT},--width {WIDTH},--num-frames {N},--fps {FPS}--output-path {PATH},--output-file-name {NAME},--save-output,--return-frames
Quantized transformers
For quantized transformer checkpoints, prefer:--model-pathfor the base pipeline--transformer-pathfor a quantizedtransformerstransformer component folder--transformer-weights-pathfor a quantized safetensors file, directory, or repo
Configuration Files
Use--config to load JSON or YAML configuration. Command-line flags override values from the config file.
Command
Config
Generate
sglang generate runs a single generation job and exits when the job finishes.
Command
HTTP server-only arguments are ignored by
sglang generate.SGLANG_CACHE_DIT_ENABLED=true or --cache-dit-config. See Cache-DiT.
Serve
sglang serve starts the HTTP server and keeps the model loaded for repeated requests.
Command
Cloud Storage
SGLang Diffusion can upload generated images and videos to S3-compatible object storage after generation.Command
Component Path Overrides
Override individual pipeline components such asvae, transformer, or text_encoder with --<component>-path.
Command
model_index.json, and the path must be either a Hugging Face repo ID or a complete component directory.
Diffusers Backend
Use--backend diffusers to force vanilla diffusers pipelines when no native SGLang implementation exists or when a model requires a custom pipeline class.
Key Options
| Argument | Values | Description |
|---|---|---|
—backend | auto, sglang, diffusers | Choose native SGLang, force native, or force diffusers |
—diffusers-attention-backend | flash, _flash_3_hub, sage, xformers, native | Attention backend for diffusers pipelines |
—trust-remote-code | flag | Required for models with custom pipeline classes |
—vae-tiling and —vae-slicing | flag | Lower memory usage for VAE decode |
—dit-precision and —vae-precision | fp16, bf16, fp32 | Precision controls |
—enable-torch-compile | flag | Enable torch.compile |
—cache-dit-config | {PATH} | Cache-DiT config for diffusers pipelines |
Example
diffusers_kwargs in a config file.