MindSpore Models#

Introduction#

MindSpore is a high-performance AI framework optimized for Ascend NPUs. This doc guides users to run MindSpore models in SGLang.

Requirements#

MindSpore currently only supports Ascend NPU devices. Users need to first install CANN 8.5. The CANN software packages can be downloaded from the Ascend Official Website.

Supported Models#

Currently, the following models are supported:

  • Qwen3: Dense and MoE models

  • DeepSeek V3/R1

  • More models coming soon…

Installation#

Note: Currently, MindSpore models are provided by an independent package sgl-mindspore. Support for MindSpore is built upon current SGLang support for Ascend NPU platform. Please first install SGLang for Ascend NPU and then install sgl-mindspore:

git clone https://github.com/mindspore-lab/sgl-mindspore.git
cd sgl-mindspore
pip install -e .

Run Model#

Current SGLang-MindSpore supports Qwen3 and DeepSeek V3/R1 models. This doc uses Qwen3-8B as an example.

Offline infer#

Use the following script for offline infer:

import sglang as sgl

# Initialize the engine with MindSpore backend
llm = sgl.Engine(
    model_path="/path/to/your/model",  # Local model path
    device="npu",                      # Use NPU device
    model_impl="mindspore",            # MindSpore implementation
    attention_backend="ascend",        # Attention backend
    tp_size=1,                         # Tensor parallelism size
    dp_size=1                          # Data parallelism size
)

# Generate text
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is"
]

sampling_params = {"temperature": 0, "top_p": 0.9}
outputs = llm.generate(prompts, sampling_params)

for prompt, output in zip(prompts, outputs):
    print(f"Prompt: {prompt}")
    print(f"Generated: {output['text']}")
    print("---")

Start server#

Launch a server with MindSpore backend:

# Basic server startup
python3 -m sglang.launch_server \
    --model-path /path/to/your/model \
    --host 0.0.0.0 \
    --device npu \
    --model-impl mindspore \
    --attention-backend ascend \
    --tp-size 1 \
    --dp-size 1

For distributed server with multiple nodes:

# Multi-node distributed server
python3 -m sglang.launch_server \
    --model-path /path/to/your/model \
    --host 0.0.0.0 \
    --device npu \
    --model-impl mindspore \
    --attention-backend ascend \
    --dist-init-addr 127.0.0.1:29500 \
    --nnodes 2 \
    --node-rank 0 \
    --tp-size 4 \
    --dp-size 2

Troubleshooting#

Debug Mode#

Enable sglang debug logging by log-level argument.

python3 -m sglang.launch_server \
    --model-path /path/to/your/model \
    --host 0.0.0.0 \
    --device npu \
    --model-impl mindspore \
    --attention-backend ascend \
    --log-level DEBUG

Enable mindspore info and debug logging by setting environments.

export GLOG_v=1  # INFO
export GLOG_v=0  # DEBUG

Explicitly select devices#

Use the following environment variable to explicitly select the devices to use.

export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7  # to set device

Some communication environment issues#

In case of some environment with special communication environment, users need set some environment variables.

export MS_ENABLE_LCCL=off # current not support LCCL communication mode in SGLang-MindSpore

Some dependencies of protobuf#

In case of some environment with special protobuf version, users need set some environment variables to avoid binary version mismatch.

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python  # to avoid protobuf binary version mismatch

Support#

For MindSpore-specific issues: