Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sglang.io/llms.txt

Use this file to discover all available pages before exploring further.

1. Model Introduction

NVIDIA Cosmos3 is a world-generation model family for text-to-image, text-to-video, and image-to-video generation. SGLang Diffusion serves the public generator checkpoints with the native Cosmos3OmniDiffusersPipeline.
ModelStatusNotes
nvidia/Cosmos3-NanoSupportedT2I, T2V, I2V
nvidia/Cosmos3-SuperSupportedT2I, T2V, I2V; use multi-GPU for the 64B checkpoint
nvidia/Cosmos3-Super-Text2ImageSupportedT2I-specialized checkpoint
nvidia/Cosmos3-Super-Image2VideoSupportedI2V-specialized checkpoint
nvidia/Cosmos3-Nano-Policy-DROIDNot supported yetAction/policy model; planned separately from visual generation
Cosmos3 video-with-sound, video-to-video conditioning, and action generation are not supported yet. Requests that set generate_sound, action_mode, or video-to-video conditioning fields return a clear error instead of being silently ignored.

2. Installation

Install SGLang with the diffusion dependencies:
Command
pip install -e "python[diffusion]"
Cosmos3 guardrails are enabled by default when the package is available:
Command
pip install "cosmos-guardrail==0.3.1"
cosmos-guardrail downloads gated NVIDIA guardrail weights, so pass a Hugging Face token if your environment needs one. If the package is not installed, SGLang skips Cosmos3 guardrails and logs a warning. To disable Cosmos3 guardrails for local experiments, set SGLANG_DISABLE_COSMOS3_GUARDRAILS=1 before starting the server.

3. Serve Cosmos3

Serve Cosmos3-Nano directly from the Hugging Face model ID:
Command
sglang serve \
  --model-type diffusion \
  --model-path nvidia/Cosmos3-Nano \
  --num-gpus 1 \
  --host 0.0.0.0 \
  --port 30010 \
  --output-path /tmp/sglang-cosmos3
For Cosmos3-Super, split the model across multiple GPUs:
Command
sglang serve \
  --model-type diffusion \
  --model-path nvidia/Cosmos3-Super \
  --num-gpus 4 \
  --host 0.0.0.0 \
  --port 30010 \
  --output-path /tmp/sglang-cosmos3
The server also accepts the specialized nvidia/Cosmos3-Super-Text2Image and nvidia/Cosmos3-Super-Image2Video checkpoint IDs.

4. OpenAI-Compatible Requests

Text to image

Cosmos3 text-to-image uses /v1/images/generations. The default Cosmos3 image response is b64_json, matching vLLM-Omni’s examples.
Command
curl -sS -X POST http://127.0.0.1:30010/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A warehouse robot folds a blue cloth on a clean workbench.",
    "size": "1280x720",
    "n": 1,
    "num_inference_steps": 35,
    "guidance_scale": 6.0,
    "flow_shift": 10.0,
    "seed": 0,
    "extra_args": {
      "use_resolution_template": false,
      "guardrails": true
    }
  }'

Text to video

Use /v1/videos to create an asynchronous job, then poll the job and download the completed MP4.
Command
job_id=$(curl -sS -X POST http://127.0.0.1:30010/v1/videos \
  --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
  --form-string "negative_prompt=blurry, distorted, low quality" \
  --form-string "size=1280x720" \
  --form-string "num_frames=81" \
  --form-string "fps=24" \
  --form-string "num_inference_steps=35" \
  --form-string "guidance_scale=4.0" \
  --form-string "flow_shift=10.0" \
  --form-string "seed=42" \
  --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
  | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')

while true; do
  status=$(curl -sS "http://127.0.0.1:30010/v1/videos/${job_id}" \
    | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && exit 1
  sleep 1
done

curl -sS -L "http://127.0.0.1:30010/v1/videos/${job_id}/content" \
  -o cosmos3_t2v.mp4

Image to video

This mirrors the official nvidia/Cosmos3-Nano Hugging Face image-to-video example:
Python
import json
import time
from pathlib import Path

import requests
from huggingface_hub import snapshot_download

base_url = "http://127.0.0.1:30010"
model_dir = Path(snapshot_download("nvidia/Cosmos3-Nano"))
asset_dir = model_dir / "assets"

prompt = json.dumps(json.loads((asset_dir / "example_i2v_prompt.json").read_text()))
negative_prompt = json.dumps(
    json.loads((asset_dir / "negative_prompt.json").read_text())
)

data = {
    "prompt": prompt,
    "negative_prompt": negative_prompt,
    "size": "1280x720",
    "num_frames": "189",
    "fps": "24",
    "num_inference_steps": "35",
    "guidance_scale": "6.0",
    "max_sequence_length": "4096",
    "flow_shift": "10.0",
    "seed": "1111",
    "extra_params": json.dumps(
        {
            "use_resolution_template": False,
            "use_duration_template": False,
            "guardrails": True,
        }
    ),
}

with (asset_dir / "example_i2v_input.jpg").open("rb") as image:
    response = requests.post(
        f"{base_url}/v1/videos",
        data=data,
        files={"input_reference": ("example_i2v_input.jpg", image, "image/jpeg")},
        timeout=60,
    )
response.raise_for_status()
video_id = response.json()["id"]

while True:
    job = requests.get(f"{base_url}/v1/videos/{video_id}", timeout=30).json()
    if job["status"] == "completed":
        break
    if job["status"] == "failed":
        raise RuntimeError(job.get("error") or "Video generation failed")
    time.sleep(1)

response = requests.get(f"{base_url}/v1/videos/{video_id}/content", timeout=300)
response.raise_for_status()
Path("cosmos3_i2v.mp4").write_bytes(response.content)

5. Cosmos3 Parameters

Cosmos3 supports the standard SGLang video and image fields such as size, num_frames, fps, num_inference_steps, guidance_scale, negative_prompt, and seed. Top-level Cosmos3 request fields:
  • max_sequence_length: maximum text token length used by the Cosmos3 tokenizer.
  • flow_shift: per-request scheduler flow shift. If omitted, SGLang uses --flow-shift, then the checkpoint scheduler default.
Put model-specific compatibility knobs in extra_params for video requests, or extra_args for image requests:
  • use_duration_template: whether to append SGLang’s generated duration suffix to video prompts.
  • use_resolution_template: accepted for vLLM-Omni request compatibility.
  • use_system_prompt: whether to add the Cosmos3 system prompt to the chat template.
  • guardrails or use_guardrails: per-request guardrail toggle when the server started with guardrails enabled.