Support Models on Ascend NPU#

This section describes the models supported on the Ascend NPU, including Large Language Models, Multimodal Language Models, Embedding Models, Reward Models and Rerank Models. Mainstream DeepSeek/Qwen/GLM series are included. You are welcome to enable various models based on your business requirements.

Large Language Models#

Models

Model Family

A2 Supported

A3 Supported

DeepSeek V3/V3.1

DeepSeek

vllm-ascend/DeepSeek-V3.2-Exp-W8A8

DeepSeek

vllm-ascend/DeepSeek-R1-0528-W8A8

DeepSeek

vllm-ascend/DeepSeek-V2-Lite-W8A8

DeepSeek

Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen

Qwen/Qwen3-32B

Qwen

Qwen/Qwen3-0.6B

Qwen

vllm-ascend/Qwen3-235B-A22B-W8A8

Qwen

Qwen/Qwen3-Next-80B-A3B-Instruct

Qwen

Qwen3-Coder-480B-A35B-Instruct-w8a8-QuaRot

Qwen

Qwen/Qwen2.5-7B-Instruct

Qwen

vllm-ascend/QWQ-32B-W8A8

Qwen

meta-llama/Llama-4-Scout-17B-16E-Instruct

Llama

AI-ModelScope/Llama-3.1-8B-Instruct

Llama

LLM-Research/Llama-3.2-1B-Instruct

Llama

mistralai/Mistral-7B-Instruct-v0.2

Mistral

google/gemma-3-4b-it

Gemma

microsoft/Phi-4-multimodal-instruct

Phi

allenai/OLMoE-1B-7B-0924

OLMoE

stabilityai/stablelm-2-1_6b

StableLM

CohereForAI/c4ai-command-r-v01

Command-R

huihui-ai/grok-2

Grok

ZhipuAI/chatglm2-6b

ChatGLM

Shanghai_AI_Laboratory/internlm2-7b

InternLM 2

LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct

ExaONE 3

xverse/XVERSE-MoE-A36B

XVERSE

HuggingFaceTB/SmolLM-1.7B

SmolLM

ZhipuAI/glm-4-9b-chat

GLM-4

XiaomiMiMo/MiMo-7B-RL

MiMo

arcee-ai/AFM-4.5B-Base

Arcee AFM-4.5B

Howeee/persimmon-8b-chat

Persimmon

inclusionAI/Ling-lite

Ling

ibm-granite/granite-3.1-8b-instruct

Granite

ibm-granite/granite-3.0-3b-a800m-instruct

Granite MoE

databricks/dbrx-instruct

DBRX (Databricks)

baichuan-inc/Baichuan2-13B-Chat

Baichuan 2 (7B, 13B)

baidu/ERNIE-4.5-21B-A3B-PT

ERNIE-4.5 (4.5, 4.5MoE series)

openbmb/MiniCPM3-4B

MiniCPM (v3, 4B)

openai/gpt-oss-120b

GPTOSS

×

×

Multimodal Language Models#

Models

Model Family (Variants)

A2 Supported

A3 Supported

Qwen/Qwen2.5-VL-3B-Instruct

Qwen-VL

Qwen/Qwen2.5-VL-72B-Instruct

Qwen-VL

Qwen/Qwen3-VL-30B-A3B-Instruct

Qwen-VL

Qwen/Qwen3-VL-8B-Instruct

Qwen-VL

Qwen/Qwen3-VL-4B-Instruct

Qwen-VL

Qwen/Qwen3-VL-235B-A22B-Instruct

Qwen-VL

deepseek-ai/deepseek-vl2

DeepSeek-VL2

deepseek-ai/Janus-Pro-7B

Janus-Pro (1B, 7B)

openbmb/MiniCPM-V-2_6

MiniCPM-V / MiniCPM-o

google/gemma-3-4b-it

Gemma 3 (Multimodal)

mistralai/Mistral-Small-3.1-24B-Instruct-2503

Mistral-Small-3.1-24B

microsoft/Phi-4-multimodal-instruct

Phi-4-multimodal-instruct

XiaomiMiMo/MiMo-VL-7B-RL

MiMo-VL (7B)

AI-ModelScope/llava-v1.6-34b

LLaVA (v1.5 & v1.6)

lmms-lab/llava-next-72b

LLaVA-NeXT (8B, 72B)

lmms-lab/llava-onevision-qwen2-7b-ov

LLaVA-OneVision

Kimi/Kimi-VL-A3B-Instruct

Kimi-VL (A3B)

ZhipuAI/GLM-4.5V

GLM-4.5V (106B)

meta-llama/Llama-3.2-11B-Vision-Instruct

Llama 3.2 Vision (11B)

×

×

Embedding Models#

Models

Model Family

A2 Supported

A3 Supported

intfloat/e5-mistral-7b-instruct

E5 (Llama/Mistral based)

iic/gte_Qwen2-1.5B-instruct

GTE-Qwen2

Qwen/Qwen3-Embedding-8B

Qwen3-Embedding

Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

GME (Multimodal)

AI-ModelScope/clip-vit-large-patch14-336

CLIP

BAAI/bge-large-en-v1.5

BGE

×

×

Reward Models#

Models

Model Family

A2 Supported

A3 Supported

Skywork/Skywork-Reward-Llama-3.1-8B-v0.2

Llama3.1 Reward

Shanghai_AI_Laboratory/internlm2-7b-reward

InternLM 2 Reward

Qwen/Qwen2.5-Math-RM-72B

Qwen2.5 Reward - Math

jason9693/Qwen2.5-1.5B-apeach

Qwen2.5 Reward - Sequence

Skywork/Skywork-Reward-Gemma-2-27B-v0.2

Gemma 2-27B Reward

×

×

Rerank Models#

Models

Model Family

A2 Supported

A3 Supported

BAAI/bge-reranker-v2-m3

BGE-Reranker