Environment Variables#
SGLang supports various environment variables related to Ascend NPU that can be used to configure its runtime behavior. This document provides a list of commonly used environment variables and aims to stay updated over time.
Directly Used in SGLang#
Environment Variable |
Description |
Default Value |
|---|---|---|
|
Adopts the |
|
|
Reshapes KV Cache for FIA NZ format. |
|
|
Enable dual-stream computation of shared experts |
|
|
Disable cast model weight tensor to a specific NPU |
|
|
The maximum number of dispatched tokens on each rank. |
|
Used in DeepEP Ascend#
Environment Variable |
Description |
Default Value |
|---|---|---|
|
Enable ant-moving function in dispatch stage. Indicates |
|
|
Enable ant-moving function in dispatch stage. Indicates |
|
|
Enable ant-moving function in combine stage. |
|
|
Needs to be enabled when the expert ID to be processed by |
|
|
Quantizes x to int8 and returns (tensor, scales) in dispatch operator. |
|
Others#
Environment Variable |
Description |
Default Value |
|---|---|---|
|
Used to control the optimization level of the dispatch queue |
|
|
Controls whether the chip uses saturation mode or INF_NAN mode. Detail |
|
|
Configures the maximum number of streams for the stream pool. Detail |
|
|
Controls the behavior of the cache allocator. |
|
|
The address of config store in MemFabric during PD separation, |
|
|
Controls whether synchronous mode is enabled during operator execution. Detail |
|
|
Configures the expansion position for communication algorithm scheduling. Detail |
|
|
Controls the size of the buffer area for shared data between two NPUs. |
|
|
Configures the name of the network card used by the Host |
|
|
Configures the network interface name for GLOO communication. |