Skip to main content
Star Fork

Performance & Runtime

Designed for low-latency, high-throughput inference with RadixAttention, prefix caching, and multi-GPU parallelism.

Models & Ecosystem

Broad support for Llama, Qwen, DeepSeek, and more. Compatible with Hugging Face and OpenAI APIs.

Extensive Hardware Support

Native support across Hardware Platforms including NVIDIA, AMD, Intel Xeon, Google TPU, and Ascend NPU accelerators.

Community & Training

Open-source with widespread adoption, powering 400k+ GPUs and integrated with major RL frameworks.
SGLang powers large-scale production deployments, generating trillions of tokens each day across more than 400,000 GPUs worldwide. It is hosted under the non-profit open-source organization LMSYS.

Get Started

SGLang is an inference framework meant for production level serving. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.

Install SGLang

Install SGLang with pip, from source, or via Docker on your preferred hardware platform.

Quickstart

Launch your first model server and send requests in minutes with OpenAI-compatible APIs.

News and latest blogs

SGLang and Miles Add Day-0 Support for NVIDIA Nemotron 3 Ultra for Long-Running Autonomous Agents
Higgs Audio v3 TTS on SGLang-Omni: Real-Time, Controllable Speech for Voice Agents
Heterogeneous CPU + GPU EPD Disaggregation to Boost VLM Serving
Win on TCO: How AMD Instinct\u2122 MI355X Achieves Cost-Competitive Distributed Inference Through SGLang with MoRI
Updating 1T parameters in seconds \u2014 P2P weight transfer in Large Scale Distributed RL
DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles

Learn more and join the community

Stay connected

Development roadmap to follow current priorities and upcoming work.
Weekly public development meeting to hear updates and join open discussions.
Slack for questions, feedback, and community support.
X Twitter and LinkedIn for project updates.
LMSYS blog for release notes, benchmarks, and technical deep dives.
Learning materials for blogs, slides, and videos.