> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Cosmos3

## 1. Model Introduction

[NVIDIA Cosmos3](https://huggingface.co/collections/nvidia/cosmos3) is a world-generation model family for text-to-image, text-to-video, and image-to-video generation. SGLang Diffusion serves the public generator checkpoints with the native `Cosmos3OmniDiffusersPipeline`.

| Model                              | Status            | Notes                                                          |
| ---------------------------------- | ----------------- | -------------------------------------------------------------- |
| `nvidia/Cosmos3-Nano`              | Supported         | T2I, T2V, I2V                                                  |
| `nvidia/Cosmos3-Super`             | Supported         | T2I, T2V, I2V; use multi-GPU for the 64B checkpoint            |
| `nvidia/Cosmos3-Super-Text2Image`  | Supported         | T2I-specialized checkpoint                                     |
| `nvidia/Cosmos3-Super-Image2Video` | Supported         | I2V-specialized checkpoint                                     |
| `nvidia/Cosmos3-Nano-Policy-DROID` | Not supported yet | Action/policy model; planned separately from visual generation |

Cosmos3 video-with-sound, video-to-video conditioning, and action generation are not supported yet. Requests that set `generate_sound`, `action_mode`, or video-to-video conditioning fields return a clear error instead of being silently ignored.

## 2. Installation

Install SGLang with the diffusion dependencies:

```bash Command theme={null}
pip install -e "python[diffusion]"
```

Cosmos3 guardrails are enabled by default when the package is available:

```bash Command theme={null}
pip install "cosmos-guardrail==0.3.1"
```

`cosmos-guardrail` downloads gated NVIDIA guardrail weights, so pass a Hugging Face token if your environment needs one. If the package is not installed, SGLang skips Cosmos3 guardrails and logs a warning. To disable Cosmos3 guardrails for local experiments, set `SGLANG_DISABLE_COSMOS3_GUARDRAILS=1` before starting the server.

## 3. Serve Cosmos3

Serve `Cosmos3-Nano` directly from the Hugging Face model ID:

```bash Command theme={null}
sglang serve \
  --model-path nvidia/Cosmos3-Nano \
  --num-gpus 1
```

For `Cosmos3-Super`, split the model across multiple GPUs:

```bash Command theme={null}
sglang serve \
  --model-path nvidia/Cosmos3-Super \
  --num-gpus 4
```

The server also accepts the specialized `nvidia/Cosmos3-Super-Text2Image` and `nvidia/Cosmos3-Super-Image2Video` checkpoint IDs.

## 4. OpenAI-Compatible Requests

### Text to image

Cosmos3 text-to-image uses `/v1/images/generations`. The default Cosmos3 image response is `b64_json`, matching vLLM-Omni's examples.

```bash Command theme={null}
curl -sS -X POST http://127.0.0.1:30010/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A warehouse robot folds a blue cloth on a clean workbench.",
    "size": "1280x720",
    "n": 1,
    "num_inference_steps": 35,
    "guidance_scale": 6.0,
    "flow_shift": 10.0,
    "seed": 0,
    "extra_args": {
      "use_resolution_template": false,
      "guardrails": true
    }
  }'
```

### Text to video

Use `/v1/videos` to create an asynchronous job, then poll the job and download the completed MP4.

```bash Command theme={null}
job_id=$(curl -sS -X POST http://127.0.0.1:30010/v1/videos \
  --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
  --form-string "negative_prompt=blurry, distorted, low quality" \
  --form-string "size=1280x720" \
  --form-string "num_frames=81" \
  --form-string "fps=24" \
  --form-string "num_inference_steps=35" \
  --form-string "guidance_scale=4.0" \
  --form-string "flow_shift=10.0" \
  --form-string "seed=42" \
  --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
  | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')

while true; do
  status=$(curl -sS "http://127.0.0.1:30010/v1/videos/${job_id}" \
    | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && exit 1
  sleep 1
done

curl -sS -L "http://127.0.0.1:30010/v1/videos/${job_id}/content" \
  -o cosmos3_t2v.mp4
```

### Image to video

This mirrors the official `nvidia/Cosmos3-Nano` Hugging Face image-to-video example:

```python Python theme={null}
import json
import time
from pathlib import Path

import requests
from huggingface_hub import snapshot_download

base_url = "http://127.0.0.1:30010"
model_dir = Path(snapshot_download("nvidia/Cosmos3-Nano"))
asset_dir = model_dir / "assets"

prompt = json.dumps(json.loads((asset_dir / "example_i2v_prompt.json").read_text()))
negative_prompt = json.dumps(
    json.loads((asset_dir / "negative_prompt.json").read_text())
)

data = {
    "prompt": prompt,
    "negative_prompt": negative_prompt,
    "size": "1280x720",
    "num_frames": "189",
    "fps": "24",
    "num_inference_steps": "35",
    "guidance_scale": "6.0",
    "max_sequence_length": "4096",
    "flow_shift": "10.0",
    "seed": "1111",
    "extra_params": json.dumps(
        {
            "use_resolution_template": False,
            "use_duration_template": False,
            "guardrails": True,
        }
    ),
}

with (asset_dir / "example_i2v_input.jpg").open("rb") as image:
    response = requests.post(
        f"{base_url}/v1/videos",
        data=data,
        files={"input_reference": ("example_i2v_input.jpg", image, "image/jpeg")},
        timeout=60,
    )
response.raise_for_status()
video_id = response.json()["id"]

while True:
    job = requests.get(f"{base_url}/v1/videos/{video_id}", timeout=30).json()
    if job["status"] == "completed":
        break
    if job["status"] == "failed":
        raise RuntimeError(job.get("error") or "Video generation failed")
    time.sleep(1)

response = requests.get(f"{base_url}/v1/videos/{video_id}/content", timeout=300)
response.raise_for_status()
Path("cosmos3_i2v.mp4").write_bytes(response.content)
```

## 5. Cosmos3 Parameters

Cosmos3 supports the standard SGLang video and image fields such as `size`, `num_frames`, `fps`, `num_inference_steps`, `guidance_scale`, `negative_prompt`, and `seed`.

Top-level Cosmos3 request fields:

* `max_sequence_length`: maximum text token length used by the Cosmos3 tokenizer.
* `flow_shift`: per-request scheduler flow shift. If omitted, SGLang uses `--flow-shift`, then the checkpoint scheduler default.

Put model-specific compatibility knobs in `extra_params` for video requests, or `extra_args` for image requests:

* `use_duration_template`: whether to append SGLang's generated duration suffix to video prompts.
* `use_resolution_template`: accepted for vLLM-Omni request compatibility.
* `use_system_prompt`: whether to add the Cosmos3 system prompt to the chat template.
* `guardrails` or `use_guardrails`: per-request guardrail toggle when the server started with guardrails enabled.
