# SGLang Documentation ## Docs - [DeepSeek-Math-V2](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-Math-V2.md) - [DeepSeek-OCR](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-OCR.md) - [DeepSeek-OCR-2](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-OCR-2.md) - [DeepSeek-R1](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-R1.md) - [DeepSeek-V3](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V3.md) - [DeepSeek-V3.1](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V3_1.md) - [DeepSeek-V3.2](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V3_2.md) - [DeepSeek-V4](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4.md) - [Ernie4.5](https://docs.sglang.io/cookbook/autoregressive/Ernie/Ernie4.5.md) - [Ernie4.5-VL](https://docs.sglang.io/cookbook/autoregressive/Ernie/Ernie4.5-VL.md) - [Chroma-1.0](https://docs.sglang.io/cookbook/autoregressive/FlashLabs/Chroma1.0.md) - [GLM-4.5](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.5.md) - [GLM-4.5V](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.5V.md) - [GLM-4.6](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.6.md) - [GLM-4.6V](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.6V.md) - [GLM-4.7](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.7.md) - [GLM-4.7-Flash](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.7-Flash.md) - [GLM-5](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-5.md) - [GLM-5.1](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-5.1.md) - [GLM Glyph](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-Glyph.md) - [GLM-OCR](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-OCR.md) - [Gemma 4](https://docs.sglang.io/cookbook/autoregressive/Google/Gemma4.md) - [LLaDA 2.1](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/LLaDA-2.1.md) - [Ling-2.5-1T](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ling-2.5-1T.md) - [Ling-2.6](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ling-2.6.md) - [Ring-2.5-1T](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ring-2.5-1T.md) - [Intern-S1](https://docs.sglang.io/cookbook/autoregressive/InternLM/Intern-S1.md) - [InternVL3.5](https://docs.sglang.io/cookbook/autoregressive/InternVL/InternVL3.5.md) - [Jina-reranker-m0](https://docs.sglang.io/cookbook/autoregressive/Jina/Jina-reranker-m0.md) - [Llama-3.1](https://docs.sglang.io/cookbook/autoregressive/Llama/Llama3.1.md) - [Llama-3.3-70B](https://docs.sglang.io/cookbook/autoregressive/Llama/Llama3.3-70B.md) - [Llama 4](https://docs.sglang.io/cookbook/autoregressive/Llama/Llama4.md) - [MiniMax-M2](https://docs.sglang.io/cookbook/autoregressive/MiniMax/MiniMax-M2.md) - [MiniMax-M2.5](https://docs.sglang.io/cookbook/autoregressive/MiniMax/MiniMax-M2.5.md) - [MiniMax-M2.7](https://docs.sglang.io/cookbook/autoregressive/MiniMax/MiniMax-M2.7.md) - [Devstral 2 (Mistral)](https://docs.sglang.io/cookbook/autoregressive/Mistral/Devstral-2.md) - [Ministral-3](https://docs.sglang.io/cookbook/autoregressive/Mistral/Ministral-3.md) - [Mistral Small 4](https://docs.sglang.io/cookbook/autoregressive/Mistral/Mistral-Small-4.md) - [Kimi-K2](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-K2.md) - [Kimi-K2.5](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-K2.5.md) - [Kimi-K2.6](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-K2.6.md) - [Kimi-Linear](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-Linear.md) - [Nemotron3-Nano](https://docs.sglang.io/cookbook/autoregressive/NVIDIA/Nemotron3-Nano.md) - [Nemotron 3 Nano Omni](https://docs.sglang.io/cookbook/autoregressive/NVIDIA/Nemotron3-Nano-Omni.md) - [NVIDIA Nemotron3-Super](https://docs.sglang.io/cookbook/autoregressive/NVIDIA/Nemotron3-Super.md) - [GPT-OSS](https://docs.sglang.io/cookbook/autoregressive/OpenAI/GPT-OSS.md) - [Qwen2.5-VL](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen2.5-VL.md) - [Qwen3](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.md) - [Qwen3-Coder](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-Coder.md) - [Qwen3-Coder-Next](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-Coder-Next.md) - [Qwen3-Next](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-Next.md) - [Qwen3-VL](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-VL.md) - [Qwen3.5](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.5.md) - [Qwen3.6](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.6.md) - [Step3-VL-10B](https://docs.sglang.io/cookbook/autoregressive/StepFun/Step3-VL-10B.md) - [Step-3.5](https://docs.sglang.io/cookbook/autoregressive/StepFun/Step3.5.md) - [Hunyuan 3 Preview](https://docs.sglang.io/cookbook/autoregressive/Tencent/Hunyuan3-Preview.md) - [MiMo-V2-Flash](https://docs.sglang.io/cookbook/autoregressive/Xiaomi/MiMo-V2-Flash.md) - [MiMo-V2.5](https://docs.sglang.io/cookbook/autoregressive/Xiaomi/MiMo-V2.5.md) - [Overview](https://docs.sglang.io/cookbook/autoregressive/intro.md): Practical guides for deploying and using large language models and vision language models with SGLang. - [Autoregressive Model Benchmark Documentation](https://docs.sglang.io/cookbook/base/benchmarks/autoregressive_model_benchmark.md) - [Diffusion Models Benchmark Documentation](https://docs.sglang.io/cookbook/base/benchmarks/diffusion_model_benchmark.md) - [Server Arguments](https://docs.sglang.io/cookbook/base/reference/server_arguments.md) - [FLUX](https://docs.sglang.io/cookbook/diffusion/FLUX/FLUX.md) - [LTX](https://docs.sglang.io/cookbook/diffusion/LTX/LTX.md): Run LTX-2 and LTX-2.3 video generation pipelines with SGLang Diffusion. - [MOVA](https://docs.sglang.io/cookbook/diffusion/MOVA/MOVA.md) - [Qwen-Image](https://docs.sglang.io/cookbook/diffusion/Qwen-Image/Qwen-Image.md) - [Qwen-Image-Edit-2511](https://docs.sglang.io/cookbook/diffusion/Qwen-Image/Qwen-Image-Edit.md) - [Wan2.1](https://docs.sglang.io/cookbook/diffusion/Wan/Wan2.1.md) - [Wan2.2](https://docs.sglang.io/cookbook/diffusion/Wan/Wan2.2.md) - [Z-Image-Turbo](https://docs.sglang.io/cookbook/diffusion/Z-Image/Z-Image-Turbo.md) - [Overview](https://docs.sglang.io/cookbook/diffusion/intro.md): Practical guides for deploying and using diffusion models with SGLang. - [SGLang Cookbook](https://docs.sglang.io/cookbook/intro.md) - [SpecBundle Usage](https://docs.sglang.io/cookbook/specbundle/specbundle_usage.md) - [Supported Models](https://docs.sglang.io/cookbook/specbundle/supported_models.md) - [Adaptive Speculative Decoding](https://docs.sglang.io/docs/advanced_features/adaptive_speculative_decoding.md) - [Attention Backend](https://docs.sglang.io/docs/advanced_features/attention_backend.md) - [Breakable CUDA Graph](https://docs.sglang.io/docs/advanced_features/breakable_cuda_graph.md) - [Checkpoint Engine Integration](https://docs.sglang.io/docs/advanced_features/checkpoint_engine.md) - [Cuda Graph for Multi-Modal Encoder in SGLang](https://docs.sglang.io/docs/advanced_features/cuda_graph_for_multi_modal_encoder.md) - [Deterministic Inference](https://docs.sglang.io/docs/advanced_features/deterministic_inference.md) - [DP, DPA and SGLang DP Router](https://docs.sglang.io/docs/advanced_features/dp_dpa_smg_guide.md) - [DP for Multi-Modal Encoder in SGLang](https://docs.sglang.io/docs/advanced_features/dp_for_multi_modal_encoder.md) - [EPD Disaggregation](https://docs.sglang.io/docs/advanced_features/epd_disaggregation.md) - [Expert Parallelism](https://docs.sglang.io/docs/advanced_features/expert_parallelism.md) - [Hierarchical KV Caching (HiCache)](https://docs.sglang.io/docs/advanced_features/hicache.md) - [SGLang HiCache Best Practices](https://docs.sglang.io/docs/advanced_features/hicache_best_practices.md) - [HiCache System Design and Optimization](https://docs.sglang.io/docs/advanced_features/hicache_design.md) - [Runtime Attach/Detach HiCache Storage Backend (No Restart)](https://docs.sglang.io/docs/advanced_features/hicache_storage_runtime_attach_detach.md) - [HiSparse: Hierarchical Sparse Attention](https://docs.sglang.io/docs/advanced_features/hisparse_guide.md) - [Hyperparameter Tuning](https://docs.sglang.io/docs/advanced_features/hyperparameter_tuning.md) - [LoRA Serving](https://docs.sglang.io/docs/advanced_features/lora.md) - [Loading Models from Object Storage](https://docs.sglang.io/docs/advanced_features/object_storage.md) - [Observability](https://docs.sglang.io/docs/advanced_features/observability.md) - [Advanced Features](https://docs.sglang.io/docs/advanced_features/overview.md): Advanced configuration, optimization, and deployment features for SGLang. - [PD Disaggregation](https://docs.sglang.io/docs/advanced_features/pd_disaggregation.md) - [Piecewise CUDA Graph](https://docs.sglang.io/docs/advanced_features/piecewise_cuda_graph.md) - [Pipeline Parallelism for Long Context](https://docs.sglang.io/docs/advanced_features/pipeline_parallelism.md) - [Quantization](https://docs.sglang.io/docs/advanced_features/quantization.md) - [Quantized KV Cache](https://docs.sglang.io/docs/advanced_features/quantized_kv_cache.md) - [Reasoning Parser](https://docs.sglang.io/docs/advanced_features/separate_reasoning.md) - [Server Arguments](https://docs.sglang.io/docs/advanced_features/server_arguments.md) - [SGLang Model Gateway](https://docs.sglang.io/docs/advanced_features/sgl_model_gateway.md) - [SGLang for RL Systems](https://docs.sglang.io/docs/advanced_features/sglang_for_rl.md) - [Speculative Decoding](https://docs.sglang.io/docs/advanced_features/speculative_decoding.md) - [Structured Outputs](https://docs.sglang.io/docs/advanced_features/structured_outputs.md) - [Structured Outputs For Reasoning Models](https://docs.sglang.io/docs/advanced_features/structured_outputs_for_reasoning_models.md) - [Tool Parser](https://docs.sglang.io/docs/advanced_features/tool_parser.md) - [Query VLM with Offline Engine](https://docs.sglang.io/docs/advanced_features/vlm_query.md) - [DeepSeek OCR (OCR-1 / OCR-2)](https://docs.sglang.io/docs/basic_usage/deepseek_ocr.md) - [DeepSeek V3/V3.1/R1 Usage](https://docs.sglang.io/docs/basic_usage/deepseek_v3.md) - [DeepSeek V3.2/GLM-5 Usage](https://docs.sglang.io/docs/basic_usage/deepseek_v32.md) - [Launch GLM-4.5 / GLM-4.6 / GLM-4.7 with SGLang](https://docs.sglang.io/docs/basic_usage/glm45.md) - [GLM-4.6V / GLM-4.5V Usage](https://docs.sglang.io/docs/basic_usage/glmv.md) - [GPT OSS Usage](https://docs.sglang.io/docs/basic_usage/gpt_oss.md) - [Kimi-K2.5 Usage](https://docs.sglang.io/docs/basic_usage/kimi_k2_5.md) - [Llama4 Usage](https://docs.sglang.io/docs/basic_usage/llama4.md) - [MiniMax M2.5/M2.1/M2 Usage](https://docs.sglang.io/docs/basic_usage/minimax_m2.md) - [SGLang Native APIs](https://docs.sglang.io/docs/basic_usage/native_api.md) - [Offline Engine API](https://docs.sglang.io/docs/basic_usage/offline_engine_api.md) - [Ollama-Compatible API](https://docs.sglang.io/docs/basic_usage/ollama_api.md) - [OpenAI-Compatible APIs](https://docs.sglang.io/docs/basic_usage/openai_api.md): Documentation for OpenAI-Compatible APIs - [OpenAI APIs - Completions](https://docs.sglang.io/docs/basic_usage/openai_api_completions.md) - [OpenAI APIs - Embedding](https://docs.sglang.io/docs/basic_usage/openai_api_embeddings.md) - [OpenAI APIs - Vision](https://docs.sglang.io/docs/basic_usage/openai_api_vision.md) - [Basic Usage](https://docs.sglang.io/docs/basic_usage/overview.md): Core APIs and common usage patterns for SGLang. - [Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more)](https://docs.sglang.io/docs/basic_usage/popular_model_usage.md): Documentation for Popular Model Usage (DeepSeek, GPT-OSS, GLM, Llama, MiniMax, Qwen, and more) - [Qwen3-Next Usage](https://docs.sglang.io/docs/basic_usage/qwen3.md) - [Qwen 3.5 Usage](https://docs.sglang.io/docs/basic_usage/qwen3_5.md) - [Qwen3-VL Usage](https://docs.sglang.io/docs/basic_usage/qwen3_vl.md) - [Sampling Parameters](https://docs.sglang.io/docs/basic_usage/sampling_params.md) - [Tutorial: Sending a request](https://docs.sglang.io/docs/basic_usage/send_request.md) - [Bench Serving Guide](https://docs.sglang.io/docs/developer_guide/bench_serving.md) - [Benchmark and Profiling](https://docs.sglang.io/docs/developer_guide/benchmark_and_profiling.md) - [Contribution Guide](https://docs.sglang.io/docs/developer_guide/contribution_guide.md) - [Development Guide Using Docker](https://docs.sglang.io/docs/developer_guide/development_guide_using_docker.md) - [Development Guide for JIT Kernels](https://docs.sglang.io/docs/developer_guide/development_jit_kernel_guide.md) - [Evaluating New Models with SGLang](https://docs.sglang.io/docs/developer_guide/evaluating_new_models.md) - [MSProbe Debugging Guide](https://docs.sglang.io/docs/developer_guide/msprobe_debugging_guide.md) - [Developer Guide](https://docs.sglang.io/docs/developer_guide/overview.md): Contributing to SGLang — development setup, benchmarking, and evaluation. - [Installation](https://docs.sglang.io/docs/get-started/install.md): Install SGLang with pip/uv, source, Docker, Kubernetes, and cloud deployment options. - [Quickstart](https://docs.sglang.io/docs/get-started/quickstart.md): Get up and running with SGLang in minutes: install, launch a server, and send your first request. - [AMD GPUs](https://docs.sglang.io/docs/hardware-platforms/amd_gpu.md) - [Apple Silicon with Metal](https://docs.sglang.io/docs/hardware-platforms/apple_metal.md) - [Contribution Guide](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_contribution_guide.md) - [SGLang installation with NPUs support](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu.md) - [Best Practice on Ascend NPU](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_best_practice.md) - [DeepSeek Examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_deepseek_example.md) - [Environment Variables](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_environment_variables.md) - [GLM-5 examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_glm5_examples.md) - [Quantization on Ascend](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_quantization.md) - [Ascend NPU Quickstart](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_quick_start.md) - [Qwen3.5 examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_qwen3_5_examples.md) - [Qwen3 Examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_qwen3_examples.md) - [Ascend NPU Ring-SP Performance (Wan2.1-T2V-1.3B)](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_ring_sp_performance.md) - [Support Features on Ascend NPU](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_support_features.md) - [Support Models on Ascend NPU](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_support_models.md) - [Mindspore backend](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/mindspore_backend.md) - [CPU Servers](https://docs.sglang.io/docs/hardware-platforms/cpu_server.md) - [Moore Threads GPUs](https://docs.sglang.io/docs/hardware-platforms/mthreads_gpu.md) - [NVIDIA GPUs](https://docs.sglang.io/docs/hardware-platforms/nvidia-gpus.md) - [NVIDIA Jetson Orin](https://docs.sglang.io/docs/hardware-platforms/nvidia_jetson.md): Guide for installing and running SGLang on NVIDIA Jetson Orin devices. - [Hardware Platforms](https://docs.sglang.io/docs/hardware-platforms/overview.md): Platform-specific guides for running SGLang on GPUs, TPUs, NPUs, CPUs, and more. - [SGLang Plugin System](https://docs.sglang.io/docs/hardware-platforms/plugin.md) - [TPU](https://docs.sglang.io/docs/hardware-platforms/tpu.md): SGLang supports high-performance TPU inference through the SGLang-JAX backend, which is specifically optimized for Google Cloud TPUs. The JAX-based implementation delivers exceptional throughput and low latency for Large Language Model (LLM) serving workloads on TPU hardware. - [XPU](https://docs.sglang.io/docs/hardware-platforms/xpu.md) - [Custom Chat Template](https://docs.sglang.io/docs/references/custom_chat_template.md) - [Environment Variables](https://docs.sglang.io/docs/references/environment_variables.md) - [Troubleshooting and Frequently Asked Questions](https://docs.sglang.io/docs/references/faq.md) - [Choices Methods in SGLang](https://docs.sglang.io/docs/references/frontend/choices_methods.md) - [Frontend Language](https://docs.sglang.io/docs/references/frontend/frontend_index.md) - [SGLang Frontend Language](https://docs.sglang.io/docs/references/frontend/frontend_tutorial.md) - [Deploy On Kubernetes](https://docs.sglang.io/docs/references/multi_node_deployment/deploy_on_k8s.md) - [LWS Based PD Deploy](https://docs.sglang.io/docs/references/multi_node_deployment/lws_pd/lws_pd_deploy.md) - [Multi-Node Deployment](https://docs.sglang.io/docs/references/multi_node_deployment/multi_node.md) - [Multi-Node Deployment](https://docs.sglang.io/docs/references/multi_node_deployment/multi_node_index.md) - [DeepSeekV32-Exp RBG Based PD Deploy](https://docs.sglang.io/docs/references/multi_node_deployment/rbg_pd/deepseekv32_pd.md) - [References](https://docs.sglang.io/docs/references/overview.md): FAQ, environment variables, production metrics, deployment guides, and more. - [Post-Training Integration](https://docs.sglang.io/docs/references/post_training_integration.md) - [Production Metrics](https://docs.sglang.io/docs/references/production_metrics.md) - [Production Request Tracing](https://docs.sglang.io/docs/references/production_request_trace.md) - [CLI reference](https://docs.sglang.io/docs/sglang-diffusion/api/cli.md): Run one-off generation tasks and launch the HTTP server from the command line. - [OpenAI API](https://docs.sglang.io/docs/sglang-diffusion/api/openai_api.md): Image and video generation endpoints with LoRA adapter management. - [Post-Processing](https://docs.sglang.io/docs/sglang-diffusion/api/post_processing.md) - [Attention Backends](https://docs.sglang.io/docs/sglang-diffusion/attention_backends.md): Select and configure attention backends for SGLang diffusion pipelines. - [Cache-DiT Acceleration](https://docs.sglang.io/docs/sglang-diffusion/cache_dit.md): Configure Cache-DiT acceleration for diffusion inference. - [Caching Acceleration](https://docs.sglang.io/docs/sglang-diffusion/caching-acceleration.md): Compare caching acceleration strategies for diffusion models. - [CI Performance Baselines](https://docs.sglang.io/docs/sglang-diffusion/ci_perf.md): Generate and update diffusion performance baselines used in CI. - [Supported Models](https://docs.sglang.io/docs/sglang-diffusion/compatibility_matrix.md): Check model compatibility across diffusion optimizations and backends. - [Contributing to SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/contributing.md) - [Disaggregated Diffusion Pipeline](https://docs.sglang.io/docs/sglang-diffusion/disaggregation.md) - [Environment Variables](https://docs.sglang.io/docs/sglang-diffusion/environment_variables.md): Configure SGLang diffusion behavior with environment variables. - [SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/index.md): Accelerated image and video generation with diffusion models. - [Install SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/installation.md): Install SGLang Diffusion on NVIDIA, AMD, MUSA, and Ascend platforms. - [Performance Optimization](https://docs.sglang.io/docs/sglang-diffusion/performance-optimization.md): Optimize SGLang diffusion performance with caching, kernels, and profiling. - [Profiling](https://docs.sglang.io/docs/sglang-diffusion/profiling.md): Profile SGLang diffusion workloads with PyTorch Profiler and Nsight Systems. - [Quantization](https://docs.sglang.io/docs/sglang-diffusion/quantization.md) - [Ring SP Benchmark: Wan2.2-TI2V-5B (u1r2 vs Baseline)](https://docs.sglang.io/docs/sglang-diffusion/ring_sp_performance.md) - [How to Support New Diffusion Models](https://docs.sglang.io/docs/sglang-diffusion/support_new_models.md) - [TeaCache Acceleration](https://docs.sglang.io/docs/sglang-diffusion/teacache.md): Configure TeaCache for temporal similarity-based diffusion acceleration. - [Supported models](https://docs.sglang.io/docs/supported-models.md): See which families of SGLang-compatible models are actively maintained. - [Classification Models](https://docs.sglang.io/docs/supported-models/classify_models.md) - [Diffusion language models](https://docs.sglang.io/docs/supported-models/diffusion_language_models.md) - [Embedding models](https://docs.sglang.io/docs/supported-models/embedding_models.md): Dense and sparse embedding models with FlashInfer acceleration and SGLang's batching infrastructure. - [Large Language Models](https://docs.sglang.io/docs/supported-models/generative_models.md) - [MindSpore Models](https://docs.sglang.io/docs/supported-models/mindspore_models.md) - [Use Models From ModelScope](https://docs.sglang.io/docs/supported-models/modelscope.md) - [Multimodal Language Models](https://docs.sglang.io/docs/supported-models/multimodal_language_models.md) - [Rerank models](https://docs.sglang.io/docs/supported-models/rerank_models.md) - [Reward models](https://docs.sglang.io/docs/supported-models/reward_models.md) - [How to Support New Models](https://docs.sglang.io/docs/supported-models/support_new_models.md): This document explains how to add support for new language models and multimodal large language models (MLLMs) in SGLang. It also covers how to test new models and register external implementations. - [Transformers Fallback in SGLang](https://docs.sglang.io/docs/supported-models/transformers_fallback.md) - [Welcome to SGLang](https://docs.sglang.io/index.md): High-performance serving framework for large language and multimodal models.