System Settings section to ensure the clusters are roaring at max performance. Feel free to leave an issue here at sglang if you encounter any issues or have any problems.
Component Version Mapping For SGLang
| Component | Version | Obtain Way |
|---|---|---|
| HDK | 25.5.2 | link |
| CANN | 8.5.0 | Obtain Images |
| Pytorch Adapter | 7.3.0 | link |
| MemFabric | 1.0.5 | pip install memfabric-hybrid==1.0.5 |
| Triton | 3.2.0 | pip install triton-ascend |
| SGLang NPU Kernel | NA | link |
Obtain CANN Image
You can obtain the dependency of a specified version of CANN through an image.Command
Preparing the Running Environment
Method 1: Installing from source with prerequisites
Python Version
Onlypython==3.11 is supported currently. If you don’t want to break system pre-installed python, try installing with conda.
Command
CANN
Prior to start work with SGLang on Ascend you need to install CANN Toolkit, Kernels operator package and NNAL version 8.5.0, check the installation guideMemFabric-Hybrid
If you want to use PD disaggregation mode, you need to install MemFabric-Hybrid. MemFabric-Hybrid is a drop-in replacement of Mooncake Transfer Engine that enables KV cache transfer on Ascend NPU clusters.Command
Pytorch and Pytorch Framework Adaptor on Ascend
Command
torch and install torch_npu, check installation guide
Triton on Ascend
We provide our own implementation of Triton for Ascend.Command
SGLang Kernels NPU
We provide SGL kernels for Ascend NPU, check installation guide.DeepEP-compatible Library
We provide a DeepEP-compatible Library as a drop-in replacement of deepseek-ai’s DeepEP library, check the installation guide.Some other dependencies
Command
Installing SGLang from source
Command
Method 2: Using Docker Image
Obtain Image
You can download the SGLang image or build an image based on Dockerfile to obtain the Ascend NPU image.- Download SGLang image
- Build an image based on Dockerfile
Command
Create Docker
Notice:--privileged and --network=host are required by RDMA, which is typically needed by Ascend NPU clusters.
Notice: The following docker command is based on Atlas 800I A3 machines. If you are using Atlas 800I A2, make sure only davinci[0-7] are mapped into container.
Command
System Settings
CPU performance power scheme
The default power scheme on Ascend hardware isondemand which could affect performance, changing it to performance is recommended.
Command
Disable NUMA balancing
Command
Prevent swapping out system memory
Command
Running SGLang Service
Running Service For Large Language Models
PD Mixed Scene
Command
PD Disaggregation Scene
- Launch Prefill Server
Command
- Launch Decode Server
Command
- Launch Router
Command
Running Service For Multimodal Language Models
PD Mixed Scene
Command
