{}

Inherited base from Deployment: {baseSummary} {}

{} {Object.entries(AXIS_HANDLERS).map(([axisId, handler]) => { const fc = pgFeatures[axisId]; if (!fc) return null; const setValue = next => setDeltas(d => ({ ...d, [axisId]: next })); return handler.render({ axisId, value: deltas[axisId], setValue, fc, base: constraintBase, s, h: helpers, renderChip, renderSelect, derived: derivedMap[axisId] || null }); })} {}

Playground Command (compare with base)

{playgroundVerified ? "Verified" : "Not Verified"}

{} {matchedSiblingCell && matches


                    {matchedSiblingCell.match.strategy}

}

setRunMode("python")} role="tab" aria-selected={runMode === "python"}> Python setRunMode("docker")} role="tab" aria-selected={runMode === "docker"}> Docker

{} {!playgroundVerified && baseCell && }

            {baseCell ? diffLines.map((d, i) => 
                {d.kind === "added" ? "+ " : d.kind === "removed" ? "- " : "  "}
                {d.line}{"\n"}
              ) : "# No verified base cell at the current Deployment selection.\n# Pick a supported hardware/variant in the Deployment panel to populate the playground base."}

{pgMtpHint &&

⚠️ Speculative decoding (MTP) is on — SGLang resets --max-running-requests to 48 when it isn't set. Add --max-running-requests <N> sized for your target concurrency.

}

{} {pdRouter && routerText &&

Router (SGLang Model Gateway)

Run after both roles are up. Substitute {""} /{" "} {""} with reachable hosts (both 127.0.0.1{" "} on a same-host deployment). Client traffic (cURL) targets this router.

port {pdRouter.port}

{routerText}

} {} {modal === "curl" && } {} {modal === "env" && } {} {modal === "submit" && setModal(null)} onClick={onDialogClick}>

Submit verified cell

You've put together a combination that isn't in the verified catalog yet. After you've run the command end-to-end on the target hardware, this submits a pre-filled GitHub Issue that a maintainer can convert into a PR.

Combination


              {base.hw} / {base.variant} / {base.quant} / {base.strategy} / {base.nodes}

{} {(() => { const adds = diffLines.filter(d => d.kind === "added"); const rems = diffLines.filter(d => d.kind === "removed"); if (adds.length === 0 && rems.length === 0) return null; return <>

Overrides vs base ({adds.length} added · {rems.length} removed)

                    {[...rems, ...adds].map((d, i) => 
                        {d.kind === "added" ? "+ " : "- "}
                        {d.line.replace(/^\s*/, "")}
                      )}

; })()}

Attestation (all required)

setSubmitAttest({ ...submitAttest, ranCommand: e.target.checked })} /> I ran this exact command on the listed hardware. setSubmitAttest({ ...submitAttest, reachedReady: e.target.checked })} /> The server reached READY and answered a cURL request successfully. setSubmitAttest({ ...submitAttest, outputCorrect: e.target.checked })} /> Output looked correct on at least one prompt.

SGLang version (required)

setSubmitDraft({ ...submitDraft, sglangVersion: e.target.value })} />

Benchmark result (optional)

setSubmitDraft({ ...submitDraft, benchResult: e.target.value })} />

Notes / caveats (optional)

setSubmitDraft({
    ...submitDraft,
    notes: e.target.value
  })} />

<div style={{
    display: "flex",
    justifyContent: "flex-end",
    gap: 8,
    marginTop: 16,
    alignItems: "center"
  }}>
              {!submitReady && <span style={{
    fontSize: 11,
    opacity: 0.7,
    marginRight: "auto"
  }}>
                  Tick all attestations and fill SGLang version to enable submit.
                </span>}
              <button style={{
    ...s.iconButton,
    padding: "6px 14px"
  }} onClick={() => setModal(null)}>Cancel</button>
              <a href={submitReady ? submitUrl : undefined} target="_blank" rel="noopener noreferrer" onClick={e => {
    if (!submitReady) e.preventDefault(); else setModal(null);
  }} style={{
    ...s.primaryBtn,
    textDecoration: "none",
    display: "inline-flex",
    alignItems: "center",
    opacity: submitReady ? 1 : 0.4,
    cursor: submitReady ? "pointer" : "not-allowed"
  }}>
                Open submission on GitHub →
              </a>
            </div>
          <p style={{
    fontSize: 11,
    opacity: 0.7,
    marginTop: 10
  }}>
            The CTA opens a pre-filled GitHub Issue using the
            <code> 3-playground-verified-cell.yml</code> template. A
            maintainer with the listed hardware will review and convert it
            into a cookbook PR.
          </p>
        </dialog>}
    </div>;
};

export const benchmarks = [{
  match: {
    hw: "h100",
    variant: "8b-a1b",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 287.24,
    tpot_ms: 2.4,
    tokens_per_sec_per_gpu: 325.11
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 171.72,
    tpot_ms: 11.87,
    tokens_per_sec_per_gpu: 7875.37
  }]
}, {
  match: {
    hw: "h100",
    variant: "instruct",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 18.9,
    tpot_ms: 2.08,
    tokens_per_sec_per_gpu: 471.61
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 180.55,
    tpot_ms: 7,
    tokens_per_sec_per_gpu: 13049.7
  }]
}, {
  match: {
    hw: "h100",
    variant: "thinking",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 16.23,
    tpot_ms: 2.19,
    tokens_per_sec_per_gpu: 449.06
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 127.45,
    tpot_ms: 5.31,
    tokens_per_sec_per_gpu: 17430.9
  }]
}, {
  match: {
    hw: "h100",
    variant: "350m",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 18.8,
    tpot_ms: 1.65,
    tokens_per_sec_per_gpu: 590.39
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 476.85,
    tpot_ms: 4.26,
    tokens_per_sec_per_gpu: 18745.3
  }]
}, {
  match: {
    hw: "h100",
    variant: "jp",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 17.04,
    tpot_ms: 2.1,
    tokens_per_sec_per_gpu: 468.74
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 195.67,
    tpot_ms: 5.05,
    tokens_per_sec_per_gpu: 17694.7
  }]
}, {
  match: {
    hw: "h100",
    variant: "vl",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 22.01,
    tpot_ms: 1.54,
    tokens_per_sec_per_gpu: 630.21
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1676.37,
    tpot_ms: 3.38,
    tokens_per_sec_per_gpu: 14483.4
  }]
}, {
  match: {
    hw: "h100",
    variant: "vl-450m",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 26.01,
    tpot_ms: 1.34,
    tokens_per_sec_per_gpu: 713.36
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1604.2,
    tpot_ms: 3.38,
    tokens_per_sec_per_gpu: 14852.1
  }]
}, {
  match: {
    hw: "h200",
    variant: "8b-a1b",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 48.8,
    tpot_ms: 2.23,
    tokens_per_sec_per_gpu: 426.61
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 119.9,
    tpot_ms: 11.96,
    tokens_per_sec_per_gpu: 7913.04
  }]
}, {
  match: {
    hw: "h200",
    variant: "instruct",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 20.97,
    tpot_ms: 2.2,
    tokens_per_sec_per_gpu: 445.43
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 601.53,
    tpot_ms: 5.37,
    tokens_per_sec_per_gpu: 14874
  }]
}, {
  match: {
    hw: "h200",
    variant: "thinking",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 21.39,
    tpot_ms: 2.22,
    tokens_per_sec_per_gpu: 440.08
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 398.87,
    tpot_ms: 5.58,
    tokens_per_sec_per_gpu: 15212.9
  }]
}, {
  match: {
    hw: "h200",
    variant: "350m",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 22.51,
    tpot_ms: 1.72,
    tokens_per_sec_per_gpu: 564.53
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 880.23,
    tpot_ms: 4.37,
    tokens_per_sec_per_gpu: 15765.2
  }]
}, {
  match: {
    hw: "h200",
    variant: "jp",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 20.85,
    tpot_ms: 2.09,
    tokens_per_sec_per_gpu: 468.85
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 781.82,
    tpot_ms: 5.23,
    tokens_per_sec_per_gpu: 14492.3
  }]
}, {
  match: {
    hw: "h200",
    variant: "vl",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 20.88,
    tpot_ms: 1.32,
    tokens_per_sec_per_gpu: 732.43
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1550.27,
    tpot_ms: 3.26,
    tokens_per_sec_per_gpu: 15472
  }]
}, {
  match: {
    hw: "h200",
    variant: "vl-450m",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 23.48,
    tpot_ms: 1.2,
    tokens_per_sec_per_gpu: 798.74
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1544.41,
    tpot_ms: 3.14,
    tokens_per_sec_per_gpu: 15617.7
  }]
}, {
  match: {
    hw: "b200",
    variant: "8b-a1b",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 124.36,
    tpot_ms: 2,
    tokens_per_sec_per_gpu: 436.42
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 154.77,
    tpot_ms: 7.54,
    tokens_per_sec_per_gpu: 12343.9
  }]
}, {
  match: {
    hw: "b200",
    variant: "instruct",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 11.22,
    tpot_ms: 1.19,
    tokens_per_sec_per_gpu: 818.26
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1223.9,
    tpot_ms: 2.19,
    tokens_per_sec_per_gpu: 21137.2
  }]
}, {
  match: {
    hw: "b200",
    variant: "thinking",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 11.12,
    tpot_ms: 1.19,
    tokens_per_sec_per_gpu: 818.62
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1230.34,
    tpot_ms: 2.18,
    tokens_per_sec_per_gpu: 21121.5
  }]
}, {
  match: {
    hw: "b200",
    variant: "350m",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 12.18,
    tpot_ms: 0.91,
    tokens_per_sec_per_gpu: 1065.73
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1177.6,
    tpot_ms: 1.92,
    tokens_per_sec_per_gpu: 22636.7
  }]
}, {
  match: {
    hw: "b200",
    variant: "jp",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 11.98,
    tpot_ms: 1.19,
    tokens_per_sec_per_gpu: 817.6
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 1367.79,
    tpot_ms: 2.27,
    tokens_per_sec_per_gpu: 19794.3
  }]
}, {
  match: {
    hw: "b200",
    variant: "vl",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 11.55,
    tpot_ms: 1.22,
    tokens_per_sec_per_gpu: 807.13
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 935.24,
    tpot_ms: 2.34,
    tokens_per_sec_per_gpu: 23135.8
  }]
}, {
  match: {
    hw: "b200",
    variant: "vl-450m",
    quant: "bf16",
    strategy: "default",
    nodes: "single"
  },
  sglang_version: "0.0.0.dev1+g631db6c75",
  speed: [{
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 1,
      num_prompts: 10
    },
    ttft_ms: 12.09,
    tpot_ms: 0.92,
    tokens_per_sec_per_gpu: 1053.2
  }, {
    workload: {
      dataset: "random",
      isl: 1024,
      osl: 1024,
      max_concurrency: 100,
      num_prompts: 1000
    },
    ttft_ms: 939.41,
    tpot_ms: 2.25,
    tokens_per_sec_per_gpu: 23880.7
  }]
}];

export const config = {
  modelName: "LFM2.5",
  supportedHardware: ["h100", "h200", "b200"],
  variants: [{
    id: "8b-a1b",
    label: "8B-A1B",
    subtitle: "8.3B MoE · reasoning"
  }, {
    id: "instruct",
    label: "1.2B Instruct",
    subtitle: "1.17B dense"
  }, {
    id: "thinking",
    label: "1.2B Thinking",
    subtitle: "1.17B · reasoning"
  }, {
    id: "350m",
    label: "350M",
    subtitle: "dense"
  }, {
    id: "jp",
    label: "1.2B JP",
    subtitle: "Japanese"
  }, {
    id: "vl",
    label: "VL 1.6B",
    subtitle: "vision"
  }, {
    id: "vl-450m",
    label: "VL 450M",
    subtitle: "vision · compact"
  }],
  quantizations: [{
    id: "bf16",
    label: "BF16"
  }],
  strategies: [{
    id: "default",
    label: "Default"
  }],
  nodesOptions: [{
    id: "single",
    label: "Single Node"
  }],
  modelNames: {
    "8b-a1b|bf16": "LiquidAI/LFM2.5-8B-A1B",
    "instruct|bf16": "LiquidAI/LFM2.5-1.2B-Instruct",
    "thinking|bf16": "LiquidAI/LFM2.5-1.2B-Thinking",
    "350m|bf16": "LiquidAI/LFM2.5-350M",
    "jp|bf16": "LiquidAI/LFM2.5-1.2B-JP-202606",
    "vl|bf16": "LiquidAI/LFM2.5-VL-1.6B",
    "vl-450m|bf16": "LiquidAI/LFM2.5-VL-450M"
  },
  placeholders: {
    HOST_IP: {
      target: "command",
      label: "Bind host",
      default: "0.0.0.0"
    },
    PORT: {
      target: "command",
      label: "Bind port",
      default: "30000"
    },
    HF_TOKEN: {
      target: "command",
      label: "HF token (Docker)",
      default: "<your-hf-token>"
    },
    CURL_HOST: {
      target: "curl",
      label: "Server host",
      default: "localhost"
    },
    CURL_PORT: {
      target: "curl",
      label: "Server port",
      default: "30000"
    }
  },
  curl: `curl http://{{CURL_HOST}}:{{CURL_PORT}}/v1/chat/completions \\
-H 'Content-Type: application/json' \\
-d '{ "model": "{{MODEL_NAME}}", "messages": [{"role":"user","content":"Hello"}] }'`,
  benchmarkCommands: {
    speed: `python3 -m sglang.bench_serving \\
  --backend sglang \\
  --host {{CURL_HOST}} --port {{CURL_PORT}} \\
  --model {{MODEL_NAME}} \\
  --dataset-name {{DATASET}} \\
  --random-input-len {{ISL}} --random-output-len {{OSL}} \\
  --num-prompts {{NUM_PROMPTS}} --max-concurrency {{MAX_CONCURRENCY}}`,
    accuracy: {
      gsm8k_pct: `# To install sgl-eval: pip install git+https://github.com/sgl-project/sgl-eval
sgl-eval run gsm8k \\
  --base-url http://{{CURL_HOST}}:{{CURL_PORT}}/v1 \\
  --model {{MODEL_NAME}} \\
  --num-threads 128`,
      gpqa_pct: `# To install sgl-eval: pip install git+https://github.com/sgl-project/sgl-eval
# GPQA's HF dataset (Idavidrein/gpqa) is gated — accept its terms with your HF account first.
sgl-eval run gpqa \\
  --base-url http://{{CURL_HOST}}:{{CURL_PORT}}/v1 \\
  --model {{MODEL_NAME}} \\
  --num-threads 128`,
      mmlu_pct: `# To install sgl-eval: pip install git+https://github.com/sgl-project/sgl-eval
sgl-eval run mmlu \\
  --base-url http://{{CURL_HOST}}:{{CURL_PORT}}/v1 \\
  --model {{MODEL_NAME}} \\
  --num-threads 128`,
      aime25_pct: `# To install sgl-eval: pip install git+https://github.com/sgl-project/sgl-eval
sgl-eval run aime25 \\
  --base-url http://{{CURL_HOST}}:{{CURL_PORT}}/v1 \\
  --model {{MODEL_NAME}} \\
  --num-threads 128`,
      mmmu_pct: `python3 -m sglang.test.run_eval --eval-name mmmu \\
  --host {{CURL_HOST}} --port {{CURL_PORT}} \\
  --model {{MODEL_NAME}} \\
  --num-examples 900 --num-threads 128 --max-tokens 2048 \\
  --temperature 0.1 --min-p 0.15`
    },
    numPromptsByConc: {
      1: 10,
      16: 32,
      64: 128,
      100: 1000,
      256: 512
    }
  },
  accuracyLabels: [["gpqa_pct", "GPQA Diamond", "%"], ["aime25_pct", "AIME25", "%"], ["gsm8k_pct", "GSM8K (1-shot)", "%"], ["mmlu_pct", "MMLU", "%"], ["mmmu_pct", "MMMU (val)", "%"]],
  defaultAccuracy: {
    "8b-a1b": {
      mmlu_pct: 76.61,
      gsm8k_pct: 91.96,
      gpqa_pct: 52.27,
      aime25_pct: 45.21
    },
    thinking: {
      mmlu_pct: 63.2,
      gsm8k_pct: 86.35,
      gpqa_pct: 39.08,
      aime25_pct: 27.08
    },
    instruct: {
      mmlu_pct: 60.33,
      gsm8k_pct: 75.13,
      gpqa_pct: 34.41,
      aime25_pct: 9.58
    },
    "350m": {
      mmlu_pct: 40.69,
      gsm8k_pct: 30.63,
      gpqa_pct: 28.35
    },
    vl: {
      mmmu_pct: 39.12
    },
    "vl-450m": {
      mmmu_pct: 30.56
    }
  },
  dockerImages: {
    h100: "lmsysorg/sglang:dev-cu13",
    h200: "lmsysorg/sglang:dev-cu13",
    b200: "lmsysorg/sglang:dev-cu13"
  },
  github: {
    cookbookModel: "LiquidAI/lfm2.5"
  },
  playgroundFeatures: {
    attention: {
      knobs: [{
        id: "tp",
        label: "TP",
        values: [null, 1, 2]
      }]
    }
  },
  cells: [{
    match: {
      hw: "h100",
      variant: "8b-a1b",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--reasoning-parser qwen3", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h100",
      variant: "instruct",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h100",
      variant: "thinking",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--reasoning-parser qwen3-thinking", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h100",
      variant: "350m",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h100",
      variant: "jp",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h100",
      variant: "vl",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: ["SGLANG_USE_CUDA_IPC_TRANSPORT=1", "SGLANG_USE_IPC_POOL_HANDLE_CACHE=1"],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h100",
      variant: "vl-450m",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: ["SGLANG_USE_CUDA_IPC_TRANSPORT=1", "SGLANG_USE_IPC_POOL_HANDLE_CACHE=1"],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--mem-fraction-static 0.8", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h200",
      variant: "8b-a1b",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--reasoning-parser qwen3", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h200",
      variant: "instruct",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h200",
      variant: "thinking",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--reasoning-parser qwen3-thinking", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h200",
      variant: "350m",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h200",
      variant: "jp",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h200",
      variant: "vl",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: ["SGLANG_USE_CUDA_IPC_TRANSPORT=1", "SGLANG_USE_IPC_POOL_HANDLE_CACHE=1"],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "h200",
      variant: "vl-450m",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: ["SGLANG_USE_CUDA_IPC_TRANSPORT=1", "SGLANG_USE_IPC_POOL_HANDLE_CACHE=1"],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--tool-call-parser lfm2", "--mem-fraction-static 0.8", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "b200",
      variant: "8b-a1b",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--attention-backend flashinfer", "--reasoning-parser qwen3", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "b200",
      variant: "instruct",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--attention-backend trtllm_mha", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "b200",
      variant: "thinking",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--attention-backend trtllm_mha", "--reasoning-parser qwen3-thinking", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "b200",
      variant: "350m",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--attention-backend trtllm_mha", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "b200",
      variant: "jp",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: [],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--attention-backend trtllm_mha", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "b200",
      variant: "vl",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: ["SGLANG_USE_CUDA_IPC_TRANSPORT=1", "SGLANG_USE_IPC_POOL_HANDLE_CACHE=1"],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--attention-backend flashinfer", "--mm-attention-backend fa4", "--tool-call-parser lfm2", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }, {
    match: {
      hw: "b200",
      variant: "vl-450m",
      quant: "bf16",
      strategy: "default",
      nodes: "single"
    },
    verified: true,
    env: ["SGLANG_USE_CUDA_IPC_TRANSPORT=1", "SGLANG_USE_IPC_POOL_HANDLE_CACHE=1"],
    flags: ["--trust-remote-code", "--model-path {{MODEL_NAME}}", "--tp 1", "--attention-backend flashinfer", "--mm-attention-backend fa4", "--tool-call-parser lfm2", "--mem-fraction-static 0.8", "--host {{HOST_IP}}", "--port {{PORT}}"]
  }]
};

export const Deployment = ({config, benchmarks}) => {
  if (!config) {
    return <div style={{
      padding: 12,
      color: "#b91c1c"
    }}>Deployment: missing <code>config</code> prop</div>;
  }
  const HARDWARE_CATALOG = {
    nvidia: [{
      id: "h100",
      label: "H100",
      vram: "80GB"
    }, {
      id: "h200",
      label: "H200",
      vram: "141GB"
    }, {
      id: "b200",
      label: "B200",
      vram: "192GB"
    }, {
      id: "b300",
      label: "B300",
      vram: "288GB"
    }, {
      id: "gb200",
      label: "GB200",
      vram: "192GB"
    }, {
      id: "gb300",
      label: "GB300",
      vram: "288GB"
    }],
    amd: [{
      id: "mi300x",
      label: "MI300X",
      vram: "192GB"
    }, {
      id: "mi325x",
      label: "MI325X",
      vram: "256GB"
    }, {
      id: "mi350x",
      label: "MI350X",
      vram: "288GB"
    }, {
      id: "mi355x",
      label: "MI355X",
      vram: "288GB"
    }]
  };
  const makeStyles = isDark => ({
    container: {
      maxWidth: "900px",
      margin: "0 auto",
      display: "flex",
      flexDirection: "column",
      gap: "3px"
    },
    card: {
      padding: "5px 10px",
      border: `1px solid ${isDark ? "#374151" : "#e5e7eb"}`,
      borderLeft: `3px solid ${isDark ? "#E85D4D" : "#D45D44"}`,
      borderRadius: "4px",
      display: "flex",
      alignItems: "center",
      gap: "10px",
      background: isDark ? "#1f2937" : "#fff"
    },
    cardColumn: {
      padding: "5px 10px",
      border: `1px solid ${isDark ? "#374151" : "#e5e7eb"}`,
      borderLeft: `3px solid ${isDark ? "#E85D4D" : "#D45D44"}`,
      borderRadius: "4px",
      display: "flex",
      flexDirection: "column",
      gap: "4px",
      background: isDark ? "#1f2937" : "#fff"
    },
    title: {
      fontSize: "12px",
      fontWeight: "600",
      minWidth: "108px",
      flexShrink: 0,
      color: isDark ? "#e5e7eb" : "inherit"
    },
    vendorRow: {
      display: "flex",
      alignItems: "center",
      gap: "6px"
    },
    vendorLabel: {
      fontSize: "10px",
      fontWeight: "600",
      color: isDark ? "#9ca3af" : "#6b7280",
      minWidth: "38px",
      textTransform: "uppercase",
      letterSpacing: "0.04em"
    },
    itemsGrid: () => ({
      display: "grid",
      gridTemplateColumns: "repeat(auto-fit, minmax(72px, 1fr))",
      gap: "4px",
      flex: 1
    }),
    labelBase: {
      padding: "2px 8px",
      border: `1px solid ${isDark ? "#9ca3af" : "#d1d5db"}`,
      borderRadius: "3px",
      cursor: "pointer",
      display: "inline-flex",
      flexDirection: "column",
      alignItems: "center",
      justifyContent: "center",
      fontWeight: "500",
      fontSize: "12px",
      transition: "all 0.2s",
      userSelect: "none",
      minHeight: "26px",
      textAlign: "center",
      background: isDark ? "#374151" : "#fff",
      color: isDark ? "#e5e7eb" : "inherit"
    },
    checked: {
      background: "#D45D44",
      color: "white",
      borderColor: "#D45D44"
    },
    disabled: {
      cursor: "not-allowed",
      opacity: 0.4
    },
    subtitle: {
      display: "block",
      fontSize: "9px",
      marginTop: "1px",
      lineHeight: "1.1",
      opacity: 0.7
    },
    commandWrap: {
      position: "relative",
      flex: 1,
      background: isDark ? "#111827" : "#f5f5f5",
      borderRadius: "6px",
      border: `1px solid ${isDark ? "#374151" : "#e5e7eb"}`,
      overflow: "hidden"
    },
    commandHeader: {
      display: "flex",
      flexWrap: "wrap",
      justifyContent: "space-between",
      alignItems: "center",
      gap: "6px 10px",
      padding: "6px 10px",
      borderBottom: `1px solid ${isDark ? "#374151" : "#e5e7eb"}`,
      background: isDark ? "#1f2937" : "#fafafa"
    },
    commandPre: {
      padding: "12px 16px",
      fontFamily: "'Menlo', 'Monaco', 'Courier New', monospace",
      fontSize: "12px",
      lineHeight: "1.5",
      color: isDark ? "#e5e7eb" : "#374151",
      whiteSpace: "pre-wrap",
      overflowX: "auto",
      margin: 0
    },
    mtpWarn: {
      margin: "8px 0 0",
      padding: "8px 12px",
      borderRadius: "8px",
      fontSize: "12px",
      lineHeight: "1.45",
      background: isDark ? "#78350f" : "#fef3c7",
      color: isDark ? "#fde68a" : "#92400e",
      border: `1px solid ${isDark ? "#92400e" : "#fcd34d"}`
    },
    badge: verified => ({
      display: "inline-flex",
      alignItems: "center",
      gap: "6px",
      padding: "2px 8px",
      borderRadius: "10px",
      background: verified ? isDark ? "#064e3b" : "#d1fae5" : isDark ? "#78350f" : "#fef3c7",
      color: verified ? isDark ? "#a7f3d0" : "#065f46" : isDark ? "#fde68a" : "#92400e",
      fontSize: "11px",
      fontWeight: 600
    }),
    badgeDot: verified => ({
      width: "8px",
      height: "8px",
      borderRadius: "50%",
      background: verified ? "#10b981" : "#f59e0b"
    }),
    iconButton: {
      padding: "4px 10px",
      border: `1px solid ${isDark ? "#4b5563" : "#d1d5db"}`,
      borderRadius: "4px",
      background: isDark ? "#1f2937" : "#fff",
      color: isDark ? "#e5e7eb" : "#374151",
      fontSize: "11px",
      fontWeight: 500,
      cursor: "pointer",
      display: "inline-flex",
      alignItems: "center",
      gap: "4px"
    },
    iconRow: {
      display: "inline-flex",
      flexWrap: "wrap",
      gap: "6px"
    },
    runModeWrap: {
      display: "inline-flex",
      border: `1px solid ${isDark ? "#4b5563" : "#d1d5db"}`,
      borderRadius: "10px",
      overflow: "hidden",
      fontSize: "11px",
      fontWeight: 600,
      userSelect: "none"
    },
    runModeChip: active => ({
      padding: "2px 10px",
      cursor: "pointer",
      background: active ? isDark ? "#1f2937" : "#fff" : "transparent",
      color: active ? isDark ? "#e5e7eb" : "#111827" : isDark ? "#9ca3af" : "#6b7280",
      borderRight: `1px solid ${isDark ? "#4b5563" : "#d1d5db"}`
    }),
    runModeChipLast: active => ({
      padding: "2px 10px",
      cursor: "pointer",
      background: active ? isDark ? "#1f2937" : "#fff" : "transparent",
      color: active ? isDark ? "#e5e7eb" : "#111827" : isDark ? "#9ca3af" : "#6b7280"
    }),
    headerLeft: {
      display: "inline-flex",
      flexWrap: "wrap",
      alignItems: "center",
      gap: "8px"
    },
    modalBackdrop: {
      position: "fixed",
      inset: 0,
      background: "rgba(0,0,0,0.5)",
      display: "flex",
      alignItems: "center",
      justifyContent: "center",
      zIndex: 9999
    },
    modalBox: {
      background: isDark ? "#1f2937" : "#fff",
      color: isDark ? "#e5e7eb" : "#111827",
      borderRadius: "8px",
      padding: "20px",
      maxWidth: "720px",
      width: "92%",
      maxHeight: "85vh",
      overflowY: "auto",
      border: `1px solid ${isDark ? "#374151" : "#e5e7eb"}`,
      boxShadow: "0 10px 25px rgba(0,0,0,0.25)"
    },
    modalHeader: {
      display: "flex",
      justifyContent: "space-between",
      alignItems: "center",
      marginBottom: "12px"
    },
    modalTitle: {
      fontSize: "15px",
      fontWeight: 600
    },
    modalCloseBtn: {
      background: "transparent",
      border: "none",
      color: "inherit",
      fontSize: "20px",
      cursor: "pointer",
      padding: "0 6px",
      lineHeight: 1
    },
    formField: {
      display: "flex",
      flexDirection: "column",
      gap: "4px",
      marginBottom: "10px"
    },
    formLabel: {
      fontSize: "12px",
      fontWeight: 500,
      color: isDark ? "#9ca3af" : "#4b5563"
    },
    formInput: {
      padding: "6px 10px",
      fontSize: "13px",
      border: `1px solid ${isDark ? "#4b5563" : "#d1d5db"}`,
      borderRadius: "4px",
      background: isDark ? "#111827" : "#fff",
      color: isDark ? "#e5e7eb" : "#111827",
      fontFamily: "'Menlo', 'Monaco', 'Courier New', monospace"
    },
    sectionHeading: {
      fontSize: "12px",
      fontWeight: 600,
      textTransform: "uppercase",
      letterSpacing: "0.04em",
      color: isDark ? "#9ca3af" : "#6b7280",
      margin: "12px 0 6px 0"
    },
    primaryBtn: {
      padding: "6px 14px",
      background: "#D45D44",
      color: "white",
      border: "none",
      borderRadius: "4px",
      cursor: "pointer",
      fontSize: "13px",
      fontWeight: 500
    },
    benchCard: {
      padding: "8px 12px",
      border: `1px solid ${isDark ? "#374151" : "#e5e7eb"}`,
      borderLeft: `3px solid ${isDark ? "#E85D4D" : "#D45D44"}`,
      borderRadius: "4px",
      background: isDark ? "#1f2937" : "#fff",
      display: "flex",
      flexDirection: "column",
      gap: "8px"
    },
    benchHeader: {
      display: "flex",
      flexWrap: "wrap",
      alignItems: "baseline",
      justifyContent: "space-between",
      gap: "6px 12px"
    },
    benchTitle: {
      fontSize: "13px",
      fontWeight: 600,
      color: isDark ? "#e5e7eb" : "inherit"
    },
    benchVersion: {
      fontSize: "11px",
      color: isDark ? "#9ca3af" : "#6b7280"
    },
    benchHeaderRight: {
      display: "flex",
      flexWrap: "wrap",
      alignItems: "center",
      gap: "6px 10px",
      flexShrink: 0
    },
    benchChipRow: {
      display: "flex",
      alignItems: "center",
      gap: "6px",
      flexWrap: "wrap",
      margin: "2px 0 8px"
    },
    benchChip: {
      padding: "2px 10px",
      fontSize: "12px",
      cursor: "pointer",
      border: `1px solid ${isDark ? "#4b5563" : "#d1d5db"}`,
      borderRadius: "4px",
      background: isDark ? "#1f2937" : "#fff",
      color: isDark ? "#e5e7eb" : "#374151",
      fontFamily: "'Menlo', 'Monaco', 'Courier New', monospace"
    },
    benchChipActive: {
      background: "#D45D44",
      color: "white",
      borderColor: "#D45D44"
    },
    benchBlock: {
      border: `1px solid ${isDark ? "#374151" : "#e5e7eb"}`,
      borderRadius: "4px",
      padding: "8px 10px",
      background: isDark ? "#111827" : "#fafafa"
    },
    benchBlockTitle: {
      fontSize: "11px",
      fontWeight: 600,
      textTransform: "uppercase",
      letterSpacing: "0.04em",
      color: isDark ? "#9ca3af" : "#6b7280",
      marginBottom: "4px"
    },
    benchWorkload: {
      fontSize: "11px",
      fontStyle: "italic",
      color: isDark ? "#9ca3af" : "#6b7280",
      marginBottom: "6px",
      lineHeight: "1.3"
    },
    benchRow: {
      display: "flex",
      justifyContent: "space-between",
      fontSize: "12px",
      padding: "2px 0"
    },
    benchKey: {
      color: isDark ? "#9ca3af" : "#6b7280"
    },
    benchVal: {
      color: isDark ? "#e5e7eb" : "#111827",
      fontFamily: "'Menlo', 'Monaco', 'Courier New', monospace",
      fontWeight: 500
    },
    benchNotes: {
      fontSize: "11px",
      fontStyle: "italic",
      color: isDark ? "#9ca3af" : "#6b7280"
    },
    benchLegend: {
      fontSize: "10px",
      fontStyle: "italic",
      color: isDark ? "#6b7280" : "#9ca3af",
      marginTop: "6px",
      fontFamily: "'Menlo', 'Monaco', 'Courier New', monospace"
    },
    benchEmpty: {
      fontSize: "12px",
      fontStyle: "italic",
      color: isDark ? "#9ca3af" : "#6b7280"
    },
    benchTable: {
      display: "grid",
      columnGap: 0,
      rowGap: "3px",
      marginTop: "4px",
      alignItems: "baseline"
    },
    benchTableHead: {
      textAlign: "right",
      fontWeight: 500,
      fontSize: "11px",
      color: isDark ? "#9ca3af" : "#6b7280",
      paddingLeft: "16px",
      paddingBottom: "4px",
      whiteSpace: "nowrap"
    },
    benchTableCornerHead: {
      paddingBottom: "4px"
    },
    benchTableSeparator: {
      gridColumn: "1 / -1",
      height: "1px",
      background: isDark ? "#374151" : "#e5e7eb",
      marginTop: "-3px"
    },
    benchTableLabel: {
      textAlign: "left",
      fontSize: "12px",
      color: isDark ? "#9ca3af" : "#6b7280",
      whiteSpace: "nowrap"
    },
    benchTableValue: {
      textAlign: "right",
      fontSize: "12px",
      color: isDark ? "#e5e7eb" : "#111827",
      fontFamily: "'Menlo', 'Monaco', 'Courier New', monospace",
      fontWeight: 500,
      paddingLeft: "16px",
      whiteSpace: "nowrap"
    },
    benchTableValueMissing: {
      color: isDark ? "#6b7280" : "#9ca3af"
    }
  });
  const DIMENSIONS = ["hw", "variant", "quant", "strategy", "nodes"];
  const findCell = (cells, sel) => cells.find(c => DIMENSIONS.every(d => c.match[d] === sel[d]));
  const findBenchmark = (list, sel) => (list || []).find(b => DIMENSIONS.every(d => b.match[d] === sel[d])) || null;
  const normalizeSpeed = speed => {
    if (!speed) return [];
    return Array.isArray(speed) ? speed : [speed];
  };
  const effectiveAccuracy = (entry, sel) => entry ? {
    ...config.defaultAccuracy && config.defaultAccuracy[sel.variant] || ({}),
    ...entry.accuracy || ({})
  } : {};
  const benchmarkIsEmpty = (entry, accuracy) => {
    for (const m of normalizeSpeed(entry && entry.speed)) {
      if (m && typeof m === "object") {
        for (const [key, v] of Object.entries(m)) {
          if (key === "workload") continue;
          if (v !== null && v !== undefined) return false;
        }
      }
    }
    if (accuracy && typeof accuracy === "object") {
      for (const v of Object.values(accuracy)) {
        if (v !== null && v !== undefined) return false;
      }
    }
    return true;
  };
  const isOptionAvailable = (cells, sel, dim, value) => {
    const idx = DIMENSIONS.indexOf(dim);
    const higher = DIMENSIONS.slice(0, idx);
    return cells.some(c => c.match[dim] === value && higher.every(d => c.match[d] === sel[d]));
  };
  const snapToValidCell = (cells, sel, dim, value) => {
    const idx = DIMENSIONS.indexOf(dim);
    const higher = DIMENSIONS.slice(0, idx);
    const lower = DIMENSIONS.slice(idx + 1);
    let best = null, bestLowerMatches = -1;
    for (const c of cells) {
      if (c.match[dim] !== value) continue;
      if (!higher.every(d => c.match[d] === sel[d])) continue;
      let s = 0;
      for (const d of lower) if (c.match[d] === sel[d]) s++;
      if (s > bestLowerMatches) {
        bestLowerMatches = s;
        best = c;
      }
    }
    if (!best) return sel;
    const next = {
      ...sel,
      [dim]: value
    };
    for (const d of lower) next[d] = best.match[d];
    return next;
  };
  const validateSelection = (cells, parsed) => {
    const valid = {};
    for (const dim of DIMENSIONS) {
      const want = parsed[dim];
      const works = cells.some(c => c.match[dim] === want && DIMENSIONS.slice(0, DIMENSIONS.indexOf(dim)).every(d => c.match[d] === valid[d]));
      if (works) {
        valid[dim] = want;
      } else {
        const fallback = cells.find(c => DIMENSIONS.slice(0, DIMENSIONS.indexOf(dim)).every(d => c.match[d] === valid[d]));
        valid[dim] = fallback ? fallback.match[dim] : want;
      }
    }
    return valid;
  };
  const resolveModelName = sel => {
    const triple = `${sel.hw}|${sel.variant}|${sel.quant}`;
    const pair = `${sel.variant}|${sel.quant}`;
    return (config.modelNames[triple] ?? config.modelNames[pair]) ?? "";
  };
  const interpolate = (text, env, modelName) => text.replace(/{{(\w+)}}/g, (_, key) => key === "MODEL_NAME" ? modelName : env[key] ?? `{{${key}}}`);
  const parseNnodes = id => {
    if (id === "single") return 1;
    const m = (/^multi-(\d+)$/).exec(id);
    return m ? parseInt(m[1], 10) : 1;
  };
  const renderCommand = (cell, sel, envValues, mode = "python") => {
    if (!cell) return "# No command available for the current selection.";
    const modelName = resolveModelName(sel);
    const nnodes = parseNnodes(sel.nodes);
    const multinode = nnodes > 1;
    const cellEnv = cell.env || [];
    const flags = [...cell.flags || []];
    if (multinode) {
      const PARALLELISM_ANCHORS = ["--enable-dp-attention", "--dp", "--tp"];
      let i = -1;
      for (const anchor of PARALLELISM_ANCHORS) {
        i = flags.findIndex(f => f.split(/[\s=]/)[0] === anchor);
        if (i !== -1) break;
      }
      if (i === -1) i = flags.findIndex(f => f.startsWith("--model-path"));
      flags.splice(i + 1, 0, `--nnodes ${nnodes}`, `--node-rank {{NODE_RANK}}`, `--dist-init-addr {{NODE0_IP}}:20000`);
    }
    let cmd;
    if (mode === "docker") {
      const image = config.dockerImages && config.dockerImages[sel.hw] || "lmsysorg/sglang:dev";
      const portFlag = flags.find(x => x.split(/[\s=]/)[0] === "--port");
      const servePort = portFlag ? portFlag.slice(("--port").length).trim() : "{{PORT}}";
      const dockerLines = ["docker run --gpus all", "  --shm-size 32g", multinode ? "  --network host" : `  -p ${servePort}:${servePort}`, "  -v ~/.cache/huggingface:/root/.cache/huggingface", `  --env "HF_TOKEN={{HF_TOKEN}}"`, ...cellEnv.map(e => `  --env ${e}`), "  --ipc=host", `  ${image}`, "  sglang serve", ...flags.map(f => "    " + f)];
      cmd = dockerLines.join(" \\\n");
    } else {
      const flagBlock = flags.map(f => "  " + f).join(" \\\n");
      const envBlock = cellEnv.length ? cellEnv.join(" \\\n") + " \\\n" : "";
      cmd = `${envBlock}sglang serve \\\n${flagBlock}`;
    }
    if (multinode && config.multiNodeHints && config.multiNodeHints[sel.hw]) {
      const hint = config.multiNodeHints[sel.hw].map(line => line.length ? "# " + line : "#").join("\n");
      cmd = `${hint}\n${cmd}`;
    }
    cmd = interpolate(cmd, envValues, modelName);
    if (multinode) {
      const header = `# Multi-node (${nnodes} nodes). Run the same command on every node with:\n` + `#   <node-rank> = 0 on the head node, 1..${nnodes - 1} on the others\n` + `#   <node0-ip>  = IP of the head node (reachable from all others)`;
      cmd = `${header}\n${cmd}`;
    }
    return cmd;
  };
  const ACCURACY_LABELS = config.accuracyLabels || [];
  const renderBenchmarkCard = entry => {
    const SPEED_LABELS = [["ttft_ms", "TTFT", "ms"], ["tpot_ms", "TPOT", "ms"], ["tokens_per_sec_per_gpu", "tokens/sec/GPU", ""], ["interactivity", "interactivity", "tok/s", m => m.tpot_ms != null && m.tpot_ms !== 0 ? Math.round(1000 / m.tpot_ms * 10) / 10 : null]];
    const WORKLOAD_KEYS = ["dataset", "isl", "osl", "max_concurrency"];
    const fmt = (val, unit) => {
      if (val === null || val === undefined) return null;
      return `${val}${unit ? " " + unit : ""}`;
    };
    const formatWorkloadParts = (workload, keys) => {
      if (!workload) return "";
      const parts = [];
      if (keys.has("dataset") && workload.dataset) parts.push(workload.dataset);
      if (keys.has("isl") || keys.has("osl")) {
        if (workload.isl != null || workload.osl != null) {
          parts.push(`in/out=${workload.isl != null ? workload.isl : "?"}/${workload.osl != null ? workload.osl : "?"}`);
        }
      }
      if (keys.has("max_concurrency") && workload.max_concurrency != null) {
        parts.push(`max-concurrency=${workload.max_concurrency}`);
      }
      return parts.join(", ");
    };
    const ALWAYS_PER_COLUMN = new Set(["max_concurrency"]);
    const partitionWorkload = measurements => {
      const shared = new Set();
      const differing = new Set();
      for (const k of WORKLOAD_KEYS) {
        const seen = new Set();
        let anyPresent = false;
        for (const m of measurements) {
          const v = m && m.workload ? m.workload[k] : undefined;
          if (v != null) anyPresent = true;
          seen.add(v);
        }
        if (!anyPresent) continue;
        if (ALWAYS_PER_COLUMN.has(k) || seen.size > 1) differing.add(k); else shared.add(k);
      }
      return {
        shared,
        differing
      };
    };
    const renderBenchTable = ({title, sharedText, colHeaders, rows, colCount, legend}) => {
      if (rows.length === 0) return null;
      const showColHeaders = colHeaders.length > 0 && colHeaders.some(h => h !== "");
      return <div style={s.benchBlock}>
          <div style={s.benchBlockTitle}>{title}</div>
          {sharedText && <div style={s.benchWorkload}>{sharedText}</div>}
          <div style={{
        ...s.benchTable,
        gridTemplateColumns: `max-content repeat(${colCount}, minmax(0, 1fr))`
      }}>
            {showColHeaders && <div key="corner" style={s.benchTableCornerHead}></div>}
            {showColHeaders && colHeaders.map((h, i) => <div key={`hdr-${i}`} style={s.benchTableHead}>{h}</div>)}
            {showColHeaders && <div key="sep" style={s.benchTableSeparator}></div>}
            {rows.map(r => [<div key={`lbl-${r.label}`} style={s.benchTableLabel}>{r.label}</div>, ...r.values.map((v, i) => <div key={`val-${r.label}-${i}`} style={v === null ? {
        ...s.benchTableValue,
        ...s.benchTableValueMissing
      } : s.benchTableValue}>
                  {v !== null ? v : "—"}
                </div>)])}
          </div>
          {legend && <div style={s.benchLegend}>{legend}</div>}
        </div>;
    };
    const buildSpeedTable = measurements => {
      if (measurements.length === 0) return null;
      const {shared, differing} = partitionWorkload(measurements);
      const sharedText = formatWorkloadParts(measurements[0] && measurements[0].workload, shared);
      const colHeaders = measurements.map(m => formatWorkloadParts(m && m.workload, differing));
      const rows = SPEED_LABELS.map(tup => {
        const [key, label, unit, compute] = tup;
        const values = measurements.map(m => {
          const raw = compute ? compute(m) : m[key];
          return fmt(raw, unit);
        });
        return {
          label,
          values
        };
      });
      return {
        title: "Speed",
        sharedText,
        colHeaders,
        rows,
        colCount: measurements.length,
        legend: "interactivity = 1000 / TPOT(ms)"
      };
    };
    const buildAccuracyTable = accuracy => {
      if (!accuracy) return null;
      const rows = ACCURACY_LABELS.map(([key, label, unit]) => {
        const v = fmt(accuracy[key], unit);
        if (v === null) return null;
        return {
          label,
          values: [v]
        };
      }).filter(r => r !== null);
      if (rows.length === 0) return null;
      return {
        title: "Accuracy",
        sharedText: null,
        colHeaders: [],
        rows,
        colCount: 1
      };
    };
    const accuracy = effectiveAccuracy(entry, sel);
    const isEmpty = benchmarkIsEmpty(entry, accuracy);
    const measurements = !isEmpty ? normalizeSpeed(entry && entry.speed) : [];
    const accuracyTable = !isEmpty ? buildAccuracyTable(accuracy) : null;
    const speedTable = !isEmpty ? buildSpeedTable(measurements) : null;
    const hasBenchCmds = !isEmpty && buildBenchCommands(entry, sel) !== null;
    return <div style={s.benchCard}>
        <div style={s.benchHeader}>
          <div style={s.benchTitle}>Benchmark</div>
          <div style={s.benchHeaderRight}>
            {!isEmpty && entry && entry.sglang_version && <div style={s.benchVersion}>measured on sglang <code>{entry.sglang_version}</code></div>}
            {hasBenchCmds && <button style={s.iconButton} onClick={() => setModal("bench")}>⚡ Reproduce</button>}
          </div>
        </div>
        {isEmpty ? <div style={s.benchEmpty}>
            Benchmark data pending for this combination — submit yours via the Playground's Submit ↗ button.
          </div> : <>
            {accuracyTable && renderBenchTable(accuracyTable)}
            {speedTable && renderBenchTable(speedTable)}
            {entry && entry.notes && <div style={s.benchNotes}>{entry.notes}</div>}
          </>}
      </div>;
  };
  const buildBenchCommands = (entry, sel) => {
    const bc = config.benchmarkCommands;
    if (!bc) return null;
    const acc = effectiveAccuracy(entry, sel);
    const accuracy = [];
    if (bc.accuracy) {
      for (const [key, label] of ACCURACY_LABELS) {
        if (acc[key] == null) continue;
        const tmpl = bc.accuracy[key];
        const resolved = typeof tmpl === "string" ? tmpl : tmpl && tmpl[sel.variant] || null;
        if (resolved) accuracy.push({
          key,
          label,
          template: resolved
        });
      }
    }
    let speed = null;
    if (bc.speed && entry) {
      const ms = normalizeSpeed(entry.speed).filter(m => m && m.workload && m.workload.max_concurrency != null);
      const concurrencies = [...new Set(ms.map(m => m.workload.max_concurrency))].sort((a, b) => a - b);
      if (concurrencies.length) {
        speed = {
          template: bc.speed,
          concurrencies,
          workload: ms[0].workload,
          numPromptsOf: c => {
            const m = ms.find(x => x.workload.max_concurrency === c);
            if (m && m.workload.num_prompts != null) return m.workload.num_prompts;
            const tbl = bc.numPromptsByConc;
            if (tbl && tbl[c] != null) return tbl[c];
            return Math.max(c * 2, 200);
          }
        };
      }
    }
    if (accuracy.length === 0 && !speed) return null;
    return {
      accuracy,
      speed
    };
  };
  const buildHardwareGroups = () => {
    const supported = new Set(config.supportedHardware);
    const catalog = {};
    for (const [vendor, list] of Object.entries(HARDWARE_CATALOG)) catalog[vendor] = [...list];
    for (const hw of config.hardware || []) {
      const vendor = hw.vendor || "nvidia";
      const list = catalog[vendor] || (catalog[vendor] = []);
      const entry = {
        id: hw.id,
        label: hw.label,
        vram: hw.vram
      };
      const i = list.findIndex(x => x.id === hw.id);
      if (i >= 0) list[i] = entry; else list.push(entry);
    }
    const groups = [];
    for (const [vendor, list] of Object.entries(catalog)) {
      const items = list.filter(hw => supported.has(hw.id)).map(hw => ({
        id: hw.id,
        label: hw.label,
        subtitle: hw.vram
      }));
      if (items.length) groups.push({
        label: vendor.toUpperCase(),
        items
      });
    }
    return groups;
  };
  const initialSelectionFromCells = () => {
    const first = config.cells[0];
    if (!first) return Object.fromEntries(DIMENSIONS.map(d => [d, ""]));
    return {
      hw: first.match.hw,
      variant: first.match.variant,
      quant: first.match.quant,
      strategy: first.match.strategy,
      nodes: first.match.nodes
    };
  };
  const placeholderDefaults = schema => {
    const out = {};
    for (const [k, v] of Object.entries(schema || ({}))) out[k] = v.default ?? "";
    return out;
  };
  const [isDark, setIsDark] = useState(false);
  useEffect(() => {
    const check = () => {
      const html = document.documentElement;
      setIsDark(html.classList.contains("dark") || html.getAttribute("data-theme") === "dark" || html.style.colorScheme === "dark");
    };
    check();
    const observer = new MutationObserver(check);
    observer.observe(document.documentElement, {
      attributes: true,
      attributeFilter: ["class", "data-theme", "style"]
    });
    return () => observer.disconnect();
  }, []);
  const STORAGE_KEY = "sglang-deploy-env";
  const [env, setEnv] = useState(() => placeholderDefaults(config.placeholders));
  useEffect(() => {
    try {
      const raw = window.localStorage.getItem(STORAGE_KEY);
      if (raw) {
        const parsed = JSON.parse(raw);
        setEnv({
          ...placeholderDefaults(config.placeholders),
          ...parsed
        });
      }
    } catch {}
  }, []);
  const saveEnv = next => {
    setEnv(next);
    try {
      window.localStorage.setItem(STORAGE_KEY, JSON.stringify(next));
    } catch {}
  };
  const [sel, setSel] = useState(() => initialSelectionFromCells());
  useEffect(() => {
    const hydrate = () => {
      const raw = window.location.hash.replace(/^#/, "");
      if (!raw) return;
      const params = new URLSearchParams(raw);
      const initial = initialSelectionFromCells();
      const parsed = {
        ...initial
      };
      let touched = false;
      params.forEach((value, key) => {
        if ((key in parsed)) {
          parsed[key] = value;
          touched = true;
        }
      });
      if (!touched) return;
      setSel(validateSelection(config.cells, parsed));
      const el = document.getElementById("deployment") || document.getElementById("deploy");
      if (el) el.scrollIntoView({
        behavior: "smooth",
        block: "start"
      });
    };
    hydrate();
    window.addEventListener("hashchange", hydrate);
    return () => window.removeEventListener("hashchange", hydrate);
  }, []);
  useEffect(() => {
    const target = "#" + new URLSearchParams(sel).toString();
    if (window.location.hash !== target) {
      window.history.replaceState(null, "", target);
    }
    window.dispatchEvent(new CustomEvent("sglang-deploy-sel", {
      detail: sel
    }));
  }, [sel]);
  const [modal, setModal] = useState(null);
  useEffect(() => {
    if (modal === null) return;
    const onKey = e => {
      if (e.key === "Escape") setModal(null);
    };
    const prev = document.body.style.overflow;
    document.body.style.overflow = "hidden";
    window.addEventListener("keydown", onKey);
    return () => {
      window.removeEventListener("keydown", onKey);
      document.body.style.overflow = prev;
    };
  }, [modal]);
  const [copied, setCopied] = useState(false);
  const [curlCopied, setCurlCopied] = useState(false);
  const [envDraft, setEnvDraft] = useState(env);
  const [benchConc, setBenchConc] = useState(null);
  const [benchAcc, setBenchAcc] = useState(null);
  const [benchCopied, setBenchCopied] = useState(null);
  const [runMode, setRunMode] = useState("python");
  useEffect(() => {
    if (modal === "env") setEnvDraft(env);
  }, [modal, env]);
  const s = makeStyles(isDark);
  const cell = findCell(config.cells, sel);
  const command = renderCommand(cell, sel, env, runMode);
  const mtpHint = !!cell && (cell.flags || []).some(f => f.split(/[\s=]/)[0] === "--speculative-algorithm") && !(cell.flags || []).some(f => f.split(/[\s=]/)[0] === "--max-running-requests");
  const modelName = resolveModelName(sel);
  const curlText = interpolate(config.curl || "", env, modelName);
  const hwGroups = buildHardwareGroups();
  const benchEntry = benchmarks ? findBenchmark(benchmarks, sel) : null;
  const isEnabled = (dim, value) => isOptionAvailable(config.cells, sel, dim, value);
  const handleSelect = (dim, value) => {
    setSel(prev => snapToValidCell(config.cells, prev, dim, value));
  };
  const handleCopy = () => {
    navigator.clipboard.writeText(command);
    setCopied(true);
    setTimeout(() => setCopied(false), 1200);
  };
  const copyCurl = () => {
    navigator.clipboard.writeText(curlText);
    setCurlCopied(true);
    setTimeout(() => setCurlCopied(false), 1200);
  };
  const copyBench = (key, text) => {
    navigator.clipboard.writeText(text);
    setBenchCopied(key);
    setTimeout(() => setBenchCopied(null), 1200);
  };
  const placeholderGroups = (() => {
    const out = {
      command: [],
      curl: []
    };
    for (const [key, meta] of Object.entries(config.placeholders || ({}))) {
      (out[meta.target] || (out[meta.target] = [])).push({
        key,
        ...meta
      });
    }
    return out;
  })();
  const renderButton = (item, dim, selectedId) => {
    const checked = selectedId === item.id;
    const disabled = !isEnabled(dim, item.id);
    return <label key={item.id} style={{
      ...s.labelBase,
      ...checked ? s.checked : {},
      ...disabled ? s.disabled : {}
    }} title={disabled ? "Not supported for current selection" : ""} onClick={e => {
      if (disabled) {
        e.preventDefault();
        return;
      }
      handleSelect(dim, item.id);
    }}>
        <input type="radio" checked={checked} disabled={disabled} readOnly style={{
      display: "none"
    }} />
        <span>{item.label}</span>
        {item.subtitle && <small style={{
      ...s.subtitle,
      color: checked ? "rgba(255,255,255,0.85)" : "inherit"
    }}>
            {item.subtitle}
          </small>}
      </label>;
  };
  const renderFlatSection = (title, options, dim, selectedId) => <div style={s.card}>
      <div style={s.title}>{title}</div>
      <div style={s.itemsGrid(options.length)}>
        {options.map(item => renderButton(item, dim, selectedId))}
      </div>
    </div>;
  const maxHwCols = Math.max(...hwGroups.map(x => x.items.length));
  return <div style={s.container} className="not-prose">
      {}
      <div style={s.cardColumn}>
        <div style={{
    ...s.title,
    marginBottom: "2px"
  }}>Hardware Platform</div>
        {hwGroups.map(g => <div key={g.label} style={s.vendorRow}>
            <div style={s.vendorLabel}>{g.label}</div>
            <div style={s.itemsGrid(maxHwCols)}>
              {g.items.map(item => renderButton(item, "hw", sel.hw))}
              {Array.from({
    length: maxHwCols - g.items.length
  }).map((_, i) => <div key={`pad-${i}`} />)}
            </div>
          </div>)}
      </div>

{renderFlatSection("Model Variant", config.variants, "variant", sel.variant)}
      {renderFlatSection("Quantization", config.quantizations, "quant", sel.quant)}
      {renderFlatSection("Strategy", config.strategies, "strategy", sel.strategy)}
      {renderFlatSection("Nodes", config.nodesOptions, "nodes", sel.nodes)}

{}
      <div style={s.card}>
        <div style={s.title}>Run this Command:</div>
        <div style={s.commandWrap}>
          <div style={s.commandHeader}>
            <div style={s.headerLeft}>
              <div style={s.badge(Boolean(cell && cell.verified))}>
                <span style={s.badgeDot(Boolean(cell && cell.verified))} />
                {cell && cell.verified ? "Verified" : "Not Verified"}
              </div>
              <div style={s.runModeWrap} role="tablist" aria-label="Output format">
                <span style={s.runModeChip(runMode === "python")} onClick={() => setRunMode("python")} role="tab" aria-selected={runMode === "python"}>
                  Python
                </span>
                <span style={s.runModeChipLast(runMode === "docker")} onClick={() => setRunMode("docker")} role="tab" aria-selected={runMode === "docker"}>
                  Docker
                </span>
              </div>
            </div>
            <div style={s.iconRow}>
              <button style={s.iconButton} onClick={handleCopy}>
                {copied ? "✓ Copied" : "⧉ Copy"}
              </button>
              <button style={s.iconButton} onClick={() => setModal("curl")}>$ cURL</button>
              <button style={s.iconButton} onClick={() => setModal("env")}>⚙ Env</button>
            </div>
          </div>
          <pre style={s.commandPre}>{command}</pre>
          {mtpHint && <div style={s.mtpWarn}>
              ⚠️ Speculative decoding (MTP) is on — SGLang resets <code>--max-running-requests</code> to <strong>48</strong> when it isn't set. Add <code>--max-running-requests <N></code> sized for your target concurrency.
            </div>}
        </div>
      </div>

{}
      {benchmarks && cell && renderBenchmarkCard(benchEntry)}

{}
      <div style={{
    padding: "6px 12px",
    fontSize: "12px",
    color: isDark ? "#9ca3af" : "#6b7280",
    display: "flex",
    alignItems: "center",
    gap: "6px"
  }}>
        <span>Need to go beyond the verified matrix?</span>
        <button type="button" onClick={() => {
    const el = document.getElementById("playground");
    if (el) el.scrollIntoView({
      behavior: "smooth",
      block: "start"
    });
  }} style={{
    background: "transparent",
    border: "none",
    padding: 0,
    color: isDark ? "#FDBA74" : "#C2410C",
    cursor: "pointer",
    fontSize: "12px",
    fontWeight: 600,
    textDecoration: "underline",
    textUnderlineOffset: "2px"
  }}>
          Open the Playground →
        </button>
      </div>

{}
      {modal === "curl" && <div style={s.modalBackdrop} onClick={() => setModal(null)}>
          <div style={s.modalBox} onClick={e => e.stopPropagation()}>
            <div style={s.modalHeader}>
              <div style={s.modalTitle}>cURL example</div>
              <button style={s.modalCloseBtn} onClick={() => setModal(null)} aria-label="Close">×</button>
            </div>
            <div style={s.commandWrap}>
              <div style={s.commandHeader}>
                <div style={{
    fontSize: 11,
    opacity: 0.7
  }}>
                  Model: <code>{modelName || "(unresolved)"}</code>
                </div>
                <button style={s.iconButton} onClick={copyCurl}>
                  {curlCopied ? "✓ Copied" : "⧉ Copy"}
                </button>
              </div>
              <pre style={s.commandPre}>{curlText}</pre>
            </div>
            <p style={{
    fontSize: 11,
    opacity: 0.7,
    marginTop: 8
  }}>
              Edit <code>CURL_HOST</code> / <code>CURL_PORT</code> in the Env panel.
            </p>
          </div>
        </div>}

{}
      {modal === "env" && <div style={s.modalBackdrop} onClick={() => setModal(null)}>
          <div style={s.modalBox} onClick={e => e.stopPropagation()}>
            <div style={s.modalHeader}>
              <div style={s.modalTitle}>Env / placeholder values</div>
              <button style={s.modalCloseBtn} onClick={() => setModal(null)} aria-label="Close">×</button>
            </div>
            {placeholderGroups.curl.length > 0 && <div>
                <div style={s.sectionHeading}>cURL placeholders</div>
                {placeholderGroups.curl.map(({key, label}) => <div key={key} style={s.formField}>
                    <label style={s.formLabel}>
                      {label} <code style={{
    opacity: 0.6
  }}>{`{{${key}}}`}</code>
                    </label>
                    <input style={s.formInput} value={envDraft[key] ?? ""} onChange={e => setEnvDraft({
    ...envDraft,
    [key]: e.target.value
  })} />
                  </div>)}
              </div>}
            {placeholderGroups.command.length > 0 && <div>
                <div style={s.sectionHeading}>Command placeholders</div>
                {placeholderGroups.command.map(({key, label}) => <div key={key} style={s.formField}>
                    <label style={s.formLabel}>
                      {label} <code style={{
    opacity: 0.6
  }}>{`{{${key}}}`}</code>
                    </label>
                    <input style={s.formInput} value={envDraft[key] ?? ""} onChange={e => setEnvDraft({
    ...envDraft,
    [key]: e.target.value
  })} />
                  </div>)}
              </div>}
            <div style={{
    display: "flex",
    justifyContent: "flex-end",
    gap: 8,
    marginTop: 16
  }}>
              <button style={{
    ...s.iconButton,
    padding: "6px 14px"
  }} onClick={() => setModal(null)}>Cancel</button>
              <button style={s.primaryBtn} onClick={() => {
    saveEnv(envDraft);
    setModal(null);
  }}>Save</button>
            </div>
            <p style={{
    fontSize: 11,
    opacity: 0.7,
    marginTop: 10
  }}>
              Values persist in localStorage and are reused the next time you visit any cookbook.
            </p>
          </div>
        </div>}

{}
      {modal === "bench" && benchEntry && (() => {
    const bc = buildBenchCommands(benchEntry, sel);
    if (!bc) return null;
    const selSummary = `${sel.hw.toUpperCase()} · ${sel.variant} · ${sel.quant.toUpperCase()} · ${sel.strategy} · ${sel.nodes}`;
    let selConc = null;
    let speedCmd = null;
    if (bc.speed) {
      selConc = bc.speed.concurrencies.includes(benchConc) ? benchConc : bc.speed.concurrencies[0];
      const w = bc.speed.workload;
      speedCmd = interpolate(bc.speed.template, {
        ...env,
        DATASET: w.dataset,
        ISL: w.isl,
        OSL: w.osl,
        MAX_CONCURRENCY: selConc,
        NUM_PROMPTS: bc.speed.numPromptsOf(selConc)
      }, modelName);
    }
    let selAcc = null;
    let accCmd = null;
    if (bc.accuracy.length > 0) {
      selAcc = bc.accuracy.find(a => a.key === benchAcc) || bc.accuracy[0];
      accCmd = interpolate(selAcc.template, env, modelName);
    }
    return <div style={s.modalBackdrop} onClick={() => setModal(null)}>
            <div style={s.modalBox} onClick={e => e.stopPropagation()}>
              <div style={s.modalHeader}>
                <div style={s.modalTitle}>Benchmark commands</div>
                <button style={s.modalCloseBtn} onClick={() => setModal(null)} aria-label="Close">×</button>
              </div>
              <p style={{
      fontSize: 11,
      opacity: 0.7,
      margin: "0 0 12px"
    }}>
                For <code>{selSummary}</code>. Start the server with the Deploy command above, then run these against it.
              </p>

{selAcc && <div>
                  <div style={s.sectionHeading}>Accuracy</div>
                  {bc.accuracy.length > 1 && <div style={s.benchChipRow}>
                      <span style={{
      fontSize: 11,
      opacity: 0.7
    }}>benchmark:</span>
                      {bc.accuracy.map(a => <button key={a.key} style={{
      ...s.benchChip,
      ...a.key === selAcc.key ? s.benchChipActive : {}
    }} onClick={() => setBenchAcc(a.key)}>
                          {a.label}
                        </button>)}
                    </div>}
                  <div style={{
      ...s.commandWrap,
      marginBottom: 6
    }}>
                    <div style={s.commandHeader}>
                      <div style={{
      fontSize: 11,
      opacity: 0.7
    }}>{selAcc.label}</div>
                      <button style={s.iconButton} onClick={() => copyBench("acc", accCmd)}>
                        {benchCopied === "acc" ? "✓ Copied" : "⧉ Copy"}
                      </button>
                    </div>
                    <pre style={s.commandPre}>{accCmd}</pre>
                  </div>
                  {bc.accuracy.length > 1 && <p style={{
      fontSize: 11,
      opacity: 0.7,
      margin: "0 0 4px"
    }}>
                      Switch the benchmark chip to see each eval's command.
                    </p>}
                </div>}

{bc.speed && <div>
                  <div style={s.sectionHeading}>Speed</div>
                  {bc.speed.concurrencies.length > 1 && <div style={s.benchChipRow}>
                      <span style={{
      fontSize: 11,
      opacity: 0.7
    }}>max-concurrency:</span>
                      {bc.speed.concurrencies.map(c => <button key={c} style={{
      ...s.benchChip,
      ...c === selConc ? s.benchChipActive : {}
    }} onClick={() => setBenchConc(c)}>
                          {c}
                        </button>)}
                    </div>}
                  <div style={{
      ...s.commandWrap,
      marginBottom: 6
    }}>
                    <div style={s.commandHeader}>
                      <div style={{
      fontSize: 11,
      opacity: 0.7
    }}>max-concurrency = {selConc}</div>
                      <button style={s.iconButton} onClick={() => copyBench("speed", speedCmd)}>
                        {benchCopied === "speed" ? "✓ Copied" : "⧉ Copy"}
                      </button>
                    </div>
                    <pre style={s.commandPre}>{speedCmd}</pre>
                  </div>
                  <p style={{
      fontSize: 11,
      opacity: 0.7,
      margin: "0 0 4px"
    }}>
                    One command — switch the concurrency chip (or edit <code>--max-concurrency</code>) to reproduce each Speed column.
                  </p>
                </div>}

<p style={{
      fontSize: 11,
      opacity: 0.7,
      marginTop: 12
    }}>
                Edit <code>CURL_HOST</code> / <code>CURL_PORT</code> in the Env panel.
              </p>
            </div>
          </div>;
  })()}
    </div>;
};

## Deployment

<Accordion title="Install SGLang">
  For all methods and hardware platforms, see the [official SGLang installation guide](../../../docs/get-started/install). The two paths below match the **Python / Docker** toggle in the command panel.

<Tabs>
    <Tab title="Python (pip / uv)">
      ```bash Command theme={null}
      pip install --upgrade pip
      pip install uv
      uv pip install sglang
      ```

<Note>
        LFM2.5 support — the dense / MoE / VL model classes and the `lfm2` tool-call parser — ships on SGLang `main`. If your installed release predates it, install from source or use the Docker dev image.
      </Note>

Then run the **Python** output of the command panel below in that environment.
    </Tab>

<Tab title="Docker">
      LFM2.5 support ships in the pinned SGLang dev image:

```bash Command theme={null}
      docker pull lmsysorg/sglang:dev-cu13
      ```

For how to launch the image, see [Install → Method 3: Using Docker](../../../docs/get-started/install#method-3-using-docker). A minimal example (substitute the inner `sglang serve ...` with whatever the command generator below produces):

```bash Command theme={null}
      docker run --gpus all \
          --shm-size 32g \
          -p 30000:30000 \
          -v ~/.cache/huggingface:/root/.cache/huggingface \
          --env "HF_TOKEN=<your-hf-token>" \
          --ipc=host \
          lmsysorg/sglang:dev-cu13 \
          sglang serve <use args below>
      ```
    </Tab>
  </Tabs>
</Accordion>

Every LFM2.5 model runs on a **single GPU (TP=1)** — pick your hardware + model variant to generate the launch command. One recipe covers all operating points per variant; the commands differ only by the parsers a model needs and, on Blackwell, the attention backend. The `lfm2` tool-call parser and each reasoning model's `--reasoning-parser` are already part of the verified command.

<div style={{fontSize: "0.85em", lineHeight: "1.55", color: "#6b7280", margin: "0.5rem 0 1rem 0"}}>
  <p style={{margin: "0 0 0.3rem 0"}}><strong>Panel controls</strong> (top of the command box):</p>

<ul style={{margin: 0, paddingLeft: "1.25rem"}}>
    <li style={{marginBottom: "0.2rem"}}><strong>Python / Docker</strong> — bare <code>sglang serve …</code> for an existing SGLang env, or a <code>docker run … sglang serve …</code> wrap against the dev image from the <a href="#install">Install SGLang</a> panel above.</li>
    <li style={{marginBottom: "0.2rem"}}><strong>⧉ Copy</strong> — copies the current command (with whichever framing is active) to your clipboard.</li>
    <li style={{marginBottom: "0.2rem"}}><strong>\$ cURL</strong> — a sample request against <code>localhost:30000</code> to confirm the server is up.</li>
    <li style={{marginBottom: "0.2rem"}}><strong>⚙ Env</strong> — edits the placeholders (<code>HOST\_IP</code>, <code>PORT</code>, <code>HF\_TOKEN</code>) the command and cURL share. Persists in localStorage across cookbooks.</li>
    <li><strong>Verified / Not Verified</strong> badge — green when the <code>(hw, variant, quant, strategy, nodes)</code> combo has been run end-to-end on real hardware; yellow when auto-derived from a neighbor and not yet re-checked.</li>
  </ul>
</div>

## Playground

The Playground is where you experiment with **SGLang features beyond the verified matrix**. The Deploy panel above only emits combinations that have been signed off on; the Playground lets you turn on additional knobs on top of whichever cell the Deploy panel is currently showing. The base is read live from your Deploy selection — only your overrides change.

For LFM2.5 the exposed knob is the **TP override** (every variant is verified at TP=1; TP=2 is available for experimentation on the larger checkpoints). The reasoning and tool-call parsers are not playground toggles here — they are variant-intrinsic and already baked into each verified command.

Lines highlighted **green** are added by your overrides; lines with **red strikethrough** were in the verified base but stripped by an override. When no override differs from the base cell, the playground inherits the base's **Verified** badge; any actual change flips it to **Not Verified** until the new configuration is run end-to-end and submitted back.

<div style={{fontSize: "0.85em", lineHeight: "1.55", color: "#6b7280", margin: "0.5rem 0 1rem 0"}}>
  <p style={{margin: "0 0 0.3rem 0"}}><strong>Panel controls</strong> reuse <strong>Python / Docker</strong> · <strong>⧉ Copy</strong> · <strong>\$ cURL</strong> · <strong>⚙ Env</strong> from the Deploy panel, plus one extra:</p>

<ul style={{margin: 0, paddingLeft: "1.25rem"}}>
    <li><strong>Submit ↗</strong> — opens a pre-filled GitHub issue so you can land your override combo as a new verified cookbook cell. Shown only while the badge says <strong>Not Verified</strong>; click it once you've actually run the command on your hardware and confirmed it works.</li>
  </ul>
</div>

## 1. Model Introduction

LFM2.5 is [Liquid AI](https://www.liquid.ai/)'s family of hybrid models for on-device deployment, released under the [LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B/blob/main/LICENSE). It builds on the LFM2 architecture with extended pre-training — 10T → 28T tokens for the dense models, 12T → 38T for the 8B-A1B MoE — and large-scale reinforcement learning.

The backbone interleaves **gated short convolution blocks** with a small minority of **grouped query attention (GQA) blocks**. Each convolution block applies input-dependent multiplicative gating around a depthwise short convolution, giving fast local mixing at low compute and memory cost. The GQA blocks handle global context and long-range retrieval.

This minimal hybrid layout was selected by a hardware-in-the-loop architecture search under edge latency and memory budgets. On CPUs it delivers up to 2× faster prefill and decode than similarly sized models (see the [LFM2 Technical Report](https://arxiv.org/abs/2511.23404)).

**Key Features:**

* **Hybrid gated short conv + GQA layout**: the 1.2B / 350M dense models are 16 layers (10 conv + 6 GQA); the 8B-A1B MoE is 24 layers (18 conv + 6 GQA). With only 6 attention layers per model, the KV cache stays small even at long context.
* **Block details**: depthwise convolutions with kernel size 3; GQA with 8 KV groups and head size 64, plus RoPE and QK-Norm; pre-norm RMSNorm and SwiGLU MLPs throughout.
* **Sparse MoE (8B-A1B)**: 8.3B total / 1.5B active parameters. Every layer except the first two replaces its dense MLP with a 32-expert MoE block; each token is routed to the top-4 SwiGLU experts by a normalized sigmoid router with adaptive bias load balancing.
* **New in 2.5 (8B-A1B)**: the blocks are unchanged from LFM2-8B-A1B, but the context window grows from 32K to 128K (a RoPE base-θ increase plus long-context midtraining) and the vocabulary doubles from 65,536 to 128,000 tokens for more efficient non-Latin tokenization.
* **Pythonic tool calling**: function calls are emitted as a Python list between `<|tool_call_start|>` and `<|tool_call_end|>` tokens. The `lfm2` tool-call parser surfaces these as standard `message.tool_calls`.
* **Reasoning variants**: the 8B-A1B and 1.2B-Thinking checkpoints are reasoning-only models that always emit an explicit `<think>...</think>` chain-of-thought before the answer. The MoE's 1.5B active parameters keep those reasoning tokens cheap.
* **Multilingual**: every model except the JP checkpoints covers at least English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish (some variants add more). The dedicated JP chat checkpoints focus on Japanese (Japanese + English only).
* **Vision**: LFM2.5-VL-1.6B pairs the 1.2B language backbone with a SigLIP2 So400M NaFlex encoder for OCR, document understanding, and multilingual vision. LFM2.5-VL-450M pairs the 350M backbone with a SigLIP2 Base-86M encoder for captioning and object detection at edge sizes; bounding-box grounding and function calling are new in the 2.5 release.

**Available Models:**

<thead>
    <tr style={{borderBottom: "2px solid #d55816"}}>
      <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Model</th>
      <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Parameters</th>
      <th style={{textAlign: "right", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Context</th>
      <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>Role</th>
    </tr>
  </thead>

<tbody>
    <tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><strong><a href="https://huggingface.co/LiquidAI/LFM2.5-8B-A1B">LFM2.5-8B-A1B</a></strong></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>8.3B total / 1.5B active (MoE)</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>128K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Reasoning-tuned, agentic / tool use</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><strong><a href="https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct">LFM2.5-1.2B-Instruct</a></strong></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>1.17B (dense)</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>General instruct, RAG, data extraction</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><strong><a href="https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking">LFM2.5-1.2B-Thinking</a></strong></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>1.17B (dense)</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Reasoning (always-on chain-of-thought)</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><strong><a href="https://huggingface.co/LiquidAI/LFM2.5-350M">LFM2.5-350M</a></strong></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>350M (dense)</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Compact instruct, structured output</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><strong><a href="https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-202606">LFM2.5-1.2B-JP-202606</a></strong></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>1.17B (dense)</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Japanese chat (latest)</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><a href="https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP">LFM2.5-1.2B-JP</a></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>1.17B (dense)</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Japanese chat (original)</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><strong><a href="https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B">LFM2.5-VL-1.6B</a></strong></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>1.2B LM + SigLIP2 400M</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Vision-language (OCR, docs, multi-image)</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><strong><a href="https://huggingface.co/LiquidAI/LFM2.5-VL-450M">LFM2.5-VL-450M</a></strong></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>350M LM + SigLIP2 86M</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Compact vision-language (captioning, object detection)</td>
    </tr>

<tr>
      <td style={{padding: "9px 12px", fontWeight: 500, backgroundColor: "rgba(255,255,255,0.02)"}}><a href="https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base">LFM2.5-1.2B-Base</a></td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>1.17B (dense)</td>
      <td style={{padding: "9px 12px", textAlign: "right", backgroundColor: "rgba(255,255,255,0.02)"}}>32K</td>
      <td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>Pre-trained base (no post-training)</td>
    </tr>
  </tbody>
</table>

The Deploy panel above covers the seven serving variants; **LFM2.5-1.2B-JP** (original — launch without `--tool-call-parser`) and the **Base** repos (pre-trained only, no post-training — see [§3.5](#35-base-checkpoints)) launch the same way with the model path swapped.

**Choosing a variant:**

* **8B-A1B** — flagship for agentic and tool-calling workloads; the only 128K-context option.
* **1.2B-Thinking** — reasoning-heavy tasks: math, tool use, programming.
* **1.2B-Instruct** — the recommended pick for chat and creative writing.
* **350M** — tool use, data extraction, and structured output; not recommended for math, code, or creative writing.

**License:** [LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B/blob/main/LICENSE).

**Resources:** [LFM2.5 announcement](https://www.liquid.ai/blog/introducing-lfm2-5-the-next-generation-of-on-device-ai), [LFM2.5-8B-A1B blog](https://www.liquid.ai/blog/lfm2-5-8b-a1b), [LFM docs](https://docs.liquid.ai/lfm/getting-started/welcome), [LFM2 Technical Report (arXiv:2511.23404)](https://arxiv.org/abs/2511.23404).

## 2. Configuration Tips

* **Reasoning parser**: LFM2.5 reasoning models wrap their chain-of-thought in `<think>...</think>` tags. The command generator passes `--reasoning-parser qwen3` for **8B-A1B** (it emits an explicit opening `<think>`) and `--reasoning-parser qwen3-thinking` for **1.2B-Thinking** (always-on reasoning). This splits the thinking process into `reasoning_content`; without it the chain-of-thought stays inline in `content`.
* **Tool calling**: `--tool-call-parser lfm2` surfaces LFM2.5's Pythonic `<|tool_call_start|>[...]<|tool_call_end|>` calls as standard `message.tool_calls`. The original **1.2B-JP** does not expose tool calling; **Base** has no post-training (see [§3.5](#35-base-checkpoints)).
* **Attention backend on Blackwell (B200/sm100)**: SGLang defaults to the `trtllm_mha` backend on sm100, which is fastest for the dense text models. The **8B-A1B** uses a mamba-style state cache that runs on a page-size-1 backend, so the generator picks `--attention-backend flashinfer` for it. The **VL** language model also uses that state cache and offers two backends: `--attention-backend flashinfer` (keeps prefix/radix caching — what the generator emits), or `--attention-backend trtllm_mha --disable-radix-cache` to run the language model on Blackwell `trtllm_mha` attention (`--disable-radix-cache` lifts the page-size-1 requirement, at the cost of prefix caching). Pair either with `--mm-attention-backend fa4` for the vision tower.
* **VL vision tower (`--mm-attention-backend`)**: on sm100 the `trtllm_mha` default is fastest for text but applies *causal* attention to image tokens. For the VL model, pass `--mm-attention-backend fa4` on B200/B300 (or `fa3` on H100/H200) to restore bidirectional image-token attention and full vision quality.
* **VL multimodal feature transport**: the generator launches the VL models with `SGLANG_USE_CUDA_IPC_TRANSPORT=1 SGLANG_USE_IPC_POOL_HANDLE_CACHE=1`. The first moves the processor→scheduler image-feature handoff onto CUDA IPC instead of serializing tensors between processes; the second ships the pool handle so the scheduler opens it once and caches it, instead of opening a per-item handle on every request. On the image serving workload (1 image @ 720p, measured on VL-1.6B on H100 and B200) this pair is worth roughly 30–50% higher image throughput and 30–40% lower image TTFT vs running without them (measured on VL-1.6B, H100 and B200); decode speed (TPOT) is unaffected.
* **VL-450M memory headroom (`--mem-fraction-static 0.8`)**: with the default memory fraction, the 450M's small weights make SGLang size its static KV/mamba pools to nearly the whole GPU, leaving no headroom for image-feature tensors — under sustained concurrent image load the scheduler can crash with a CUDA OOM in the radix-cache free path. The generator caps `--mem-fraction-static 0.8` for VL-450M; the pool is still far larger than this model ever needs.
* **Mamba scheduling**: LFM2.5 runs on the default `no_buffer` mamba scheduler strategy — no `--mamba-scheduler-strategy` flag is needed. The `extra_buffer` strategy (an overlap-scheduling throughput optimization available for some Gated-DeltaNet hybrids) does not apply to LFM2.5, whose convolution blocks use `mamba_chunk_size=1`.
* **Hardware requirements**: all LFM2.5 models run on a single GPU (TP=1) on either Hopper or Blackwell. The 1.2B / 350M dense models fit in a few GB; the 8B-A1B MoE needs roughly 16 GB for bf16 weights plus KV cache. Multi-GPU tensor parallelism is not required for any variant.

**Recommended sampling parameters** — pass these explicitly on every request. Some LFM2.5 checkpoints do not ship sampling defaults in `generation_config.json`, so the server will not apply them for you. `top_k`, `min_p`, and `repetition_penalty` are not standard OpenAI `chat.completions` fields — pass them through **`extra_body`** and SGLang forwards them to its sampler. Do not set `max_tokens` unless you intend to cap output, as it can truncate a response (or a reasoning model's chain-of-thought) mid-stream.

<table style={{width: "100%", borderCollapse: "collapse", tableLayout: "fixed"}}>
  <thead>
    <tr style={{borderBottom: "2px solid #d55816"}}>
      <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>Model</th>
      <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.05)"}}>temperature</th>
      <th style={{textAlign: "left", padding: "10px 12px", fontWeight: 700, whiteSpace: "nowrap", backgroundColor: "rgba(255,255,255,0.02)"}}>extra\_body (sampler)</th>
    </tr>
  </thead>

<tbody>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-8B-A1B</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.2</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"top_k": 80, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-1.2B-Instruct</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.1</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"top_k": 50, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-1.2B-Thinking</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.05</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"top_k": 50, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-350M</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.1</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"top_k": 50, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-1.2B-JP-202606</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.1</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"top_k": 50, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-1.2B-JP</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.3</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"min_p": 0.15, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-VL-1.6B (text)</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.1</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"min_p": 0.15, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-VL-450M (text)</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.1</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"min_p": 0.15, "repetition_penalty": 1.05}`}</code></td></tr>
    <tr><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}>LFM2.5-1.2B-Base</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.05)"}}>0.3</td><td style={{padding: "9px 12px", backgroundColor: "rgba(255,255,255,0.02)"}}><code>{`{"min_p": 0.15, "repetition_penalty": 1.05}`}</code></td></tr>
  </tbody>
</table>

## 3. Advanced Usage

### 3.1 Basic Usage

A single client with the recommended sampling presets applied per model (the examples in the following sections reuse this `chat` helper):

```python Example theme={null}
from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")

# Non-OpenAI fields (top_k / min_p / repetition_penalty) ride in extra_body.
SAMPLING = {
    "LiquidAI/LFM2.5-8B-A1B":         dict(temperature=0.2,  extra_body={"top_k": 80, "repetition_penalty": 1.05}),
    "LiquidAI/LFM2.5-1.2B-Instruct":  dict(temperature=0.1,  extra_body={"top_k": 50, "repetition_penalty": 1.05}),
    "LiquidAI/LFM2.5-1.2B-Thinking":  dict(temperature=0.05, extra_body={"top_k": 50, "repetition_penalty": 1.05}),
    "LiquidAI/LFM2.5-350M":           dict(temperature=0.1,  extra_body={"top_k": 50, "repetition_penalty": 1.05}),
    "LiquidAI/LFM2.5-1.2B-JP-202606": dict(temperature=0.1,  extra_body={"top_k": 50, "repetition_penalty": 1.05}),
    "LiquidAI/LFM2.5-VL-1.6B":        dict(temperature=0.1,  extra_body={"min_p": 0.15, "repetition_penalty": 1.05}),
    "LiquidAI/LFM2.5-VL-450M":        dict(temperature=0.1,  extra_body={"min_p": 0.15, "repetition_penalty": 1.05}),
}

def chat(model, messages, **overrides):
    cfg = SAMPLING[model]
    body = cfg["extra_body"] | overrides.pop("extra_body", {})
    return client.chat.completions.create(
        model=model, messages=messages,
        temperature=cfg["temperature"], extra_body=body, **overrides,
    )

resp = chat(
    "LiquidAI/LFM2.5-1.2B-Instruct",
    [{"role": "user", "content": "What is C. elegans? Answer in one sentence."}],
)
print(resp.choices[0].message.content)
```

### 3.2 Reasoning

The 8B-A1B and 1.2B-Thinking checkpoints emit chain-of-thought as a built-in behavior. The Deploy panel launches them with the matching `--reasoning-parser`, which separates the thinking process into `reasoning_content`:

```python Example theme={null}
resp = chat(
    "LiquidAI/LFM2.5-8B-A1B",
    [{"role": "user", "content": "If a train travels 60 km/h for 2.5 hours, how far does it go?"}],
)
msg = resp.choices[0].message
print("Reasoning:", msg.reasoning_content)
print("Answer:", msg.content)
```

### 3.3 Tool Calling

LFM2.5 writes Pythonic tool calls. With `--tool-call-parser lfm2` (already part of the launch command) they are surfaced as standard `message.tool_calls`:

```python Example theme={null}
resp = chat(
    "LiquidAI/LFM2.5-1.2B-Instruct",
    [{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {"location": {"type": "string"}},
                "required": ["location"],
            },
        },
    }],
)
for call in resp.choices[0].message.tool_calls or []:
    print(call.function.name, call.function.arguments)
```

Tool calling is supported on 8B-A1B, 1.2B-Thinking, 1.2B-Instruct, 350M, 1.2B-JP-202606, VL-1.6B, and VL-450M. For the **VL** models it is text-turn-only — do not combine an image and tools in the same turn.

### 3.4 Vision Input

The VL models (VL-1.6B and VL-450M) accept images via standard OpenAI multimodal content blocks. Base64 data URIs (`data:image/jpeg;base64,...`) work in place of a URL:

```python Example theme={null}
resp = chat(
    "LiquidAI/LFM2.5-VL-1.6B",
    [{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {
                "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}},
            {"type": "text", "text": "What is in this image?"},
        ],
    }],
)
print(resp.choices[0].message.content)
```

### 3.5 Base Checkpoints

Each size ships a pre-trained Base repo — [LFM2.5-1.2B-Base](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base), [LFM2.5-350M-Base](https://huggingface.co/LiquidAI/LFM2.5-350M-Base), and [LFM2.5-8B-A1B-Base](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-Base) — intended for fine-tuning and continued pre-training.

The repos ship a ChatML-style chat template, so `chat.completions` requests format normally. The checkpoints have no post-training, though — don't expect instruction following. For raw text continuation:

```python Example theme={null}
comp = client.completions.create(
    model="LiquidAI/LFM2.5-1.2B-Base",
    prompt="The capital of France is",
    temperature=0.3,
    extra_body={"min_p": 0.15, "repetition_penalty": 1.05},
)
print(comp.choices[0].text)
```