Documentation Index
Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
Use this file to discover all available pages before exploring further.
This section describes the models supported on the Ascend NPU, including Large Language Models, Multimodal Language
Models, Embedding Models, Reward Models and Rerank Models. Mainstream DeepSeek/Qwen/GLM series are included.
You are welcome to enable various models based on your business requirements.
Large Language Models
| Models | Model Family | A2 Supported | A3 Supported |
|---|
| DeepSeek V3/V3.1 | DeepSeek | ✅ | ✅ |
| DeepSeek-V3.2-W8A8 | DeepSeek | ✅ | ✅ |
| DeepSeek-R1-0528-W8A8 | DeepSeek | ✅ | ✅ |
| DeepSeek-V2-Lite-W8A8 | DeepSeek | ✅ | ✅ |
| Eco-Tech/Qwen3.6-35B-A3B-w8a8 | Qwen3.6 | ✅ | ✅ |
| Eco-Tech/Qwen3.6-27B-w8a8 | Qwen3.6 | ✅ | ✅ |
| Eco-Tech/Qwen3.5-397B-A17B-w8a8-mtp | Qwen3.5 | ✅ | ✅ |
| Eco-Tech/Qwen3.5-122B-A10B-w8a8-mtp | Qwen3.5 | ✅ | ✅ |
| Eco-Tech/Qwen3.5-35B-A3B-w8a8-mtp | Qwen3.5 | ✅ | ✅ |
| Eco-Tech/Qwen3.5-27B-w8a8-mtp | Qwen3.5 | ✅ | ✅ |
| Qwen/Qwen3.5-9B | Qwen3.5 | ✅ | ✅ |
| Qwen/Qwen3.5-4B | Qwen3.5 | ✅ | ✅ |
| Qwen/Qwen3.5-0.8B | Qwen3.5 | ✅ | ✅ |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | Qwen3 | ✅ | ✅ |
| Qwen/Qwen3-32B | Qwen3 | ✅ | ✅ |
| Qwen/Qwen3-0.6B | Qwen3 | ✅ | ✅ |
| Qwen3-235B-A22B-W8A8 | Qwen3 | ✅ | ✅ |
| Qwen/Qwen3-Next-80B-A3B-Instruct | Qwen3 | ✅ | ✅ |
| Qwen3-Coder-480B-A35B-Instruct-w8a8-QuaRot | Qwen3 | ✅ | ✅ |
| Qwen/Qwen2.5-7B-Instruct | Qwen2.5 | ✅ | ✅ |
| QWQ-32B-W8A8 | QWQ | ✅ | ✅ |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | Llama | ✅ | ✅ |
| AI-ModelScope/Llama-3.1-8B-Instruct | Llama | ✅ | ✅ |
| LLM-Research/llama-2-7b | Llama | ✅ | ✅ |
| LLM-Research/Llama-3.2-1B-Instruct | Llama | ✅ | ✅ |
| mistralai/Mistral-7B-Instruct-v0.2 | Mistral | ✅ | ✅ |
| google/gemma-3-4b-it | Gemma | ✅ | ✅ |
| microsoft/Phi-4-multimodal-instruct | Phi | ✅ | ✅ |
| allenai/OLMoE-1B-7B-0924 | OLMoE | ✅ | ✅ |
| stabilityai/stablelm-2-1_6b | StableLM | ✅ | ✅ |
| CohereForAI/c4ai-command-r-v01 | Command-R | ✅ | ✅ |
| huihui-ai/grok-2 | Grok | ✅ | ✅ |
| ZhipuAI/chatglm2-6b | ChatGLM | ✅ | ✅ |
| Shanghai_AI_Laboratory/internlm2-7b | InternLM 2 | ✅ | ✅ |
| LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct | ExaONE 3 | ✅ | ✅ |
| xverse/XVERSE-MoE-A36B | XVERSE | ✅ | ✅ |
| HuggingFaceTB/SmolLM-1.7B | SmolLM | ✅ | ✅ |
| Eco-Tech/GLM-5.1-w4a8 | GLM-5.1 | ✅ | ✅ |
| Eco-Tech/GLM-5-w4a8 | GLM-5 | ✅ | ✅ |
| ZhipuAI/glm-4-9b-chat | GLM-4 | ✅ | ✅ |
| XiaomiMiMo/MiMo-7B-RL | MiMo | ✅ | ✅ |
| arcee-ai/AFM-4.5B-Base | Arcee AFM-4.5B | ✅ | ✅ |
| Howeee/persimmon-8b-chat | Persimmon | ✅ | ✅ |
| inclusionAI/Ling-lite | Ling | ✅ | ✅ |
| ibm-granite/granite-3.1-8b-instruct | Granite | ✅ | ✅ |
| ibm-granite/granite-3.0-3b-a800m-instruct | Granite MoE | ✅ | ✅ |
| AI-ModelScope/dbrx-instruct | DBRX (Databricks) | ✅ | ✅ |
| baichuan-inc/Baichuan2-13B-Chat | Baichuan 2 (7B, 13B) | ✅ | ✅ |
| baidu/ERNIE-4.5-21B-A3B-PT | ERNIE-4.5 (4.5, 4.5MoE series) | ✅ | ✅ |
| OpenBMB/MiniCPM3-4B | MiniCPM (v3, 4B) | ✅ | ✅ |
| Eco-Tech/Kimi-K2.6-w4a8 | Kimi | ✅ | ✅ |
| Eco-Tech/Kimi-K2.5-w4a8 | Kimi | ✅ | ✅ |
| moonshotai/Kimi-K2-Thinking | Kimi | ✅ | ✅ |
| moonshotai/Kimi-Linear-48B-A3B-Instruct | Kimi Linear (48B-A3B) | ✅ | ✅ |
| eigen-ai-labs/gpt-oss-120b-bf16 | GPTOSS | ✅ | ✅ |
| allenai/OLMo-2-1124-7B-Instruct | OLMo | ✅ | ✅ |
| Eco-Tech/MiniMax-M2.5-w8a8-QuaRot | MiniMax-M2.5 | ✅ | ✅ |
| cyankiwi/MiniMax-M2-BF16 | MiniMax-M2 | ✅ | ✅ |
| upstage/SOLAR-10.7B-Instruct-v1.0 | Solar | ✅ | ✅ |
| FLM/Tele-FLM | Tele FLM (52B-1T) | ✅ | ✅ |
| bigcode/starcoder2-7b | StarCoder2 | ✅ | ✅ |
| arcee-ai/Trinity-Mini | Trinity (Nano, Mini) | ✅ | ✅ |
| OrionStarAI/Orion-14B-Base | Orion (14B) | ✅ | ✅ |
| EleutherAI/gpt-j-6b | GPT-J (6B) | ✅ | ✅ |
Multimodal Language Models
| Models | Model Family (Variants) | A2 Supported | A3 Supported |
|---|
| Qwen/Qwen2.5-VL-3B-Instruct | Qwen2.5-VL | ✅ | ✅ |
| Qwen/Qwen2.5-VL-72B-Instruct | Qwen2.5-VL | ✅ | ✅ |
| Qwen/Qwen3-VL-30B-A3B-Instruct | Qwen3-VL | ✅ | ✅ |
| Qwen/Qwen3-VL-8B-Instruct | Qwen3-VL | ✅ | ✅ |
| Qwen/Qwen3-VL-4B-Instruct | Qwen3-VL | ✅ | ✅ |
| Qwen/Qwen3-VL-235B-A22B-Instruct | Qwen3-VL | ✅ | ✅ |
| deepseek-ai/deepseek-vl2 | DeepSeek-VL2 | ✅ | ✅ |
| deepseek-ai/Janus-Pro-1B | Janus-Pro (1B, 7B) | ✅ | ✅ |
| deepseek-ai/Janus-Pro-7B | Janus-Pro (1B, 7B) | ✅ | ✅ |
| openbmb/MiniCPM-V-2_6 | MiniCPM-V / MiniCPM-o | ✅ | ✅ |
| openbmb/MiniCPM-o-2_6 | MiniCPM-V / MiniCPM-o | ✅ | ✅ |
| google/gemma-3-4b-it | Gemma 3 (Multimodal) | ✅ | ✅ |
| mistralai/Mistral-Small-3.1-24B-Instruct-2503 | Mistral-Small-3.1-24B | ✅ | ✅ |
| microsoft/Phi-4-multimodal-instruct | Phi-4-multimodal-instruct | ✅ | ✅ |
| XiaomiMiMo/MiMo-VL-7B-RL | MiMo-VL (7B) | ✅ | ✅ |
| AI-ModelScope/llava-v1.6-34b | LLaVA (v1.5 & v1.6) | ✅ | ✅ |
| lmms-lab/llava-next-72b | LLaVA-NeXT (8B, 72B) | ✅ | ✅ |
| lmms-lab/llava-onevision-qwen2-7b-ov | LLaVA-OneVision | ✅ | ✅ |
| moonshotai/Kimi-VL-A3B-Instruct | Kimi-VL (A3B) | ✅ | ✅ |
| ZhipuAI/GLM-4.5V | GLM-4.5V (106B) | ✅ | ✅ |
| LLM-Research/Llama-3.2-11B-Vision-Instruct | Llama 3.2 Vision (11B) | ✅ | ✅ |
| rednote-hilab/dots.ocr | DotsVLM-OCR | ✅ | ✅ |
| PaddlePaddle/ERNIE-4.5-VL-28B-A3B-PT | Ernie4.5-VL | ✅ | ✅ |
| Qwen/Qwen3-Omni-30B-A3B-Instruct | Qwen3-Omni | ✅ | ✅ |
| stepfun-ai/Step3-VL-10B | Step3-VL (10B) | ✅ | ✅ |
Diffusion language models
| Models | Model Family | A2 Supported | A3 Supported |
|---|
| inclusionAI/LLaDA2.0-flash | LLaDA2.0 (mini, flash) | ✅ | ✅ |
| JetLM/SDAR-8B-Chat | SDAR (JetLM) | ✅ | ✅ |
| JetLM/SDAR-30B-A3B-Chat | SDAR (JetLM) | ✅ | ✅ |
Embedding Models
| Models | Model Family | A2 Supported | A3 Supported |
|---|
| intfloat/e5-mistral-7b-instruct | E5 (Llama/Mistral based) | ✅ | ✅ |
| iic/gte_Qwen2-1.5B-instruct | GTE-Qwen2 | ✅ | ✅ |
| Qwen/Qwen3-Embedding-8B | Qwen3-Embedding | ✅ | ✅ |
| Alibaba-NLP/gme-Qwen2-VL-2B-Instruct | GME (Multimodal) | ✅ | ✅ |
| AI-ModelScope/clip-vit-large-patch14-336 | CLIP | ✅ | ✅ |
| BAAI/bge-large-en-v1.5 | BGE | ✅ | ✅ |
Reward Models
| Models | Model Family | A2 Supported | A3 Supported |
|---|
| Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 | Llama3.1 Reward | ✅ | ✅ |
| Shanghai_AI_Laboratory/internlm2-7b-reward | InternLM 2 Reward | ✅ | ✅ |
| Qwen/Qwen2.5-Math-RM-72B | Qwen2.5 Reward - Math | ✅ | ✅ |
| Howeee/Qwen2.5-1.5B-apeach | Qwen2.5 Reward - Sequence | ✅ | ✅ |
| AI-ModelScope/Skywork-Reward-Gemma-2-27B-v0.2 | Gemma 2-27B Reward | ✅ | ✅ |
Rerank Models
| Models | Model Family | A2 Supported | A3 Supported |
|---|
| BAAI/bge-reranker-v2-m3 | BGE-Reranker | ✅ | ✅ |
| Qwen/Qwen3-Reranker-8B | Qwen3-Reranker (decoder-only yes/no) | ✅ | ✅ |
| Qwen/Qwen3-VL-Reranker-2B | Qwen3-VL-Reranker (multimodal yes/no) | ✅ | ✅ |