Skip to main content
SGLang Documentation home page
Search...
⌘K
Ask AI
Search...
Navigation
Popular Model Usage
Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)
Get Started
User Guide
Hardware
Cookbook
SGLang Diffusion
Basic Usage
Basic Usage
OpenAI-Compatible APIs
Ollama-Compatible API
Offline Engine API
SGLang Native APIs
Sampling Parameters
Popular Model Usage
Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)
DeepSeek V3/V3.1/R1 Usage
DeepSeek V3.2/GLM-5 Usage
DeepSeek OCR (OCR-1 / OCR-2)
Launch GLM-4.5 / GLM-4.6 / GLM-4.7 with SGLang
GLM-4.6V / GLM-4.5V Usage
GPT OSS Usage
Kimi-K2.5 Usage
MiniMax M2.5/M2.1/M2 Usage
Qwen3-Next Usage
Qwen 3.5 Usage
Qwen3-VL Usage
Llama4 Usage
Advanced Features
Advanced Features
Server Arguments
Loading Models from Object Storage
Hyperparameter Tuning
Attention Backend
HiSparse: Hierarchical Sparse Attention
Speculative Decoding
Adaptive Speculative Decoding
Structured Outputs
Structured Outputs For Reasoning Models
Tool Parser
Reasoning Parser
Quantization
Quantized KV Cache
DP, DPA and SGLang DP Router
Expert Parallelism
LoRA Serving
PD Disaggregation
EPD Disaggregation
Pipeline Parallelism for Long Context
Hierarchical KV Caching (HiCache)
Query VLM with Offline Engine
DP for Multi-Modal Encoder in SGLang
Cuda Graph for Multi-Modal Encoder in SGLang
Breakable CUDA Graph
Piecewise CUDA Graph
SGLang Model Gateway
Deterministic Inference
Observability
Checkpoint Engine Integration
SGLang for RL Systems
Supported Models
Supported models
Text Generation
Retrieval and Ranking
Specialized Models
Extending SGLang
Developer Guide
Developer Guide
Contribution Guide
Development
Benchmarking
Evaluating New Models with SGLang
References
References
Troubleshooting and Frequently Asked Questions
Environment Variables
Production Metrics
Production Request Tracing
Multi-Node Deployment
Custom Chat Template
Frontend Language
Cookbook
Post-Training Integration
Popular Model Usage
Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)
Copy page
Documentation for Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)
Copy page
For more usage examples and recipes, visit the
SGLang Cookbook
.
Deepseek V3
Deepseek V32
Glm45
Glmv
Gpt Oss
Minimax M2
Qwen3
Qwen3 5
Qwen3 Vl
Deepseek Ocr
Llama4
Sampling Parameters
Previous
DeepSeek V3/V3.1/R1 Usage
Next
⌘I
Assistant
Responses are generated using AI and may contain mistakes.