Documentation Index
Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
Use this file to discover all available pages before exploring further.
1. Model Introduction
NVIDIA Cosmos3 is a world-generation model family for text-to-image, text-to-video, and image-to-video generation. SGLang Diffusion serves the public generator checkpoints with the native Cosmos3OmniDiffusersPipeline.
| Model | Status | Notes |
|---|
nvidia/Cosmos3-Nano | Supported | T2I, T2V, I2V |
nvidia/Cosmos3-Super | Supported | T2I, T2V, I2V; use multi-GPU for the 64B checkpoint |
nvidia/Cosmos3-Super-Text2Image | Supported | T2I-specialized checkpoint |
nvidia/Cosmos3-Super-Image2Video | Supported | I2V-specialized checkpoint |
nvidia/Cosmos3-Nano-Policy-DROID | Not supported yet | Action/policy model; planned separately from visual generation |
Cosmos3 video-with-sound, video-to-video conditioning, and action generation are not supported yet. Requests that set generate_sound, action_mode, or video-to-video conditioning fields return a clear error instead of being silently ignored.
2. Installation
Install SGLang with the diffusion dependencies:
pip install -e "python[diffusion]"
Cosmos3 guardrails are enabled by default when the package is available:
pip install "cosmos-guardrail==0.3.1"
cosmos-guardrail downloads gated NVIDIA guardrail weights, so pass a Hugging Face token if your environment needs one. If the package is not installed, SGLang skips Cosmos3 guardrails and logs a warning. To disable Cosmos3 guardrails for local experiments, set SGLANG_DISABLE_COSMOS3_GUARDRAILS=1 before starting the server.
3. Serve Cosmos3
Serve Cosmos3-Nano directly from the Hugging Face model ID:
sglang serve \
--model-type diffusion \
--model-path nvidia/Cosmos3-Nano \
--num-gpus 1 \
--host 0.0.0.0 \
--port 30010 \
--output-path /tmp/sglang-cosmos3
For Cosmos3-Super, split the model across multiple GPUs:
sglang serve \
--model-type diffusion \
--model-path nvidia/Cosmos3-Super \
--num-gpus 4 \
--host 0.0.0.0 \
--port 30010 \
--output-path /tmp/sglang-cosmos3
The server also accepts the specialized nvidia/Cosmos3-Super-Text2Image and nvidia/Cosmos3-Super-Image2Video checkpoint IDs.
4. OpenAI-Compatible Requests
Text to image
Cosmos3 text-to-image uses /v1/images/generations. The default Cosmos3 image response is b64_json, matching vLLM-Omni’s examples.
curl -sS -X POST http://127.0.0.1:30010/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "A warehouse robot folds a blue cloth on a clean workbench.",
"size": "1280x720",
"n": 1,
"num_inference_steps": 35,
"guidance_scale": 6.0,
"flow_shift": 10.0,
"seed": 0,
"extra_args": {
"use_resolution_template": false,
"guardrails": true
}
}'
Text to video
Use /v1/videos to create an asynchronous job, then poll the job and download the completed MP4.
job_id=$(curl -sS -X POST http://127.0.0.1:30010/v1/videos \
--form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
--form-string "negative_prompt=blurry, distorted, low quality" \
--form-string "size=1280x720" \
--form-string "num_frames=81" \
--form-string "fps=24" \
--form-string "num_inference_steps=35" \
--form-string "guidance_scale=4.0" \
--form-string "flow_shift=10.0" \
--form-string "seed=42" \
--form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
| python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
while true; do
status=$(curl -sS "http://127.0.0.1:30010/v1/videos/${job_id}" \
| python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
[ "$status" = "completed" ] && break
[ "$status" = "failed" ] && exit 1
sleep 1
done
curl -sS -L "http://127.0.0.1:30010/v1/videos/${job_id}/content" \
-o cosmos3_t2v.mp4
Image to video
This mirrors the official nvidia/Cosmos3-Nano Hugging Face image-to-video example:
import json
import time
from pathlib import Path
import requests
from huggingface_hub import snapshot_download
base_url = "http://127.0.0.1:30010"
model_dir = Path(snapshot_download("nvidia/Cosmos3-Nano"))
asset_dir = model_dir / "assets"
prompt = json.dumps(json.loads((asset_dir / "example_i2v_prompt.json").read_text()))
negative_prompt = json.dumps(
json.loads((asset_dir / "negative_prompt.json").read_text())
)
data = {
"prompt": prompt,
"negative_prompt": negative_prompt,
"size": "1280x720",
"num_frames": "189",
"fps": "24",
"num_inference_steps": "35",
"guidance_scale": "6.0",
"max_sequence_length": "4096",
"flow_shift": "10.0",
"seed": "1111",
"extra_params": json.dumps(
{
"use_resolution_template": False,
"use_duration_template": False,
"guardrails": True,
}
),
}
with (asset_dir / "example_i2v_input.jpg").open("rb") as image:
response = requests.post(
f"{base_url}/v1/videos",
data=data,
files={"input_reference": ("example_i2v_input.jpg", image, "image/jpeg")},
timeout=60,
)
response.raise_for_status()
video_id = response.json()["id"]
while True:
job = requests.get(f"{base_url}/v1/videos/{video_id}", timeout=30).json()
if job["status"] == "completed":
break
if job["status"] == "failed":
raise RuntimeError(job.get("error") or "Video generation failed")
time.sleep(1)
response = requests.get(f"{base_url}/v1/videos/{video_id}/content", timeout=300)
response.raise_for_status()
Path("cosmos3_i2v.mp4").write_bytes(response.content)
5. Cosmos3 Parameters
Cosmos3 supports the standard SGLang video and image fields such as size, num_frames, fps, num_inference_steps, guidance_scale, negative_prompt, and seed.
Top-level Cosmos3 request fields:
max_sequence_length: maximum text token length used by the Cosmos3 tokenizer.
flow_shift: per-request scheduler flow shift. If omitted, SGLang uses --flow-shift, then the checkpoint scheduler default.
Put model-specific compatibility knobs in extra_params for video requests, or extra_args for image requests:
use_duration_template: whether to append SGLang’s generated duration suffix to video prompts.
use_resolution_template: accepted for vLLM-Omni request compatibility.
use_system_prompt: whether to add the Cosmos3 system prompt to the chat template.
guardrails or use_guardrails: per-request guardrail toggle when the server started with guardrails enabled.