Environment Variables

Environment Variables#

SGLang supports various environment variables related to Ascend NPU that can be used to configure its runtime behavior. This document provides a list of commonly used environment variables and aims to stay updated over time.

Directly Used in SGLang#

Environment Variable

Description

Default Value

SGLANG_NPU_USE_MLAPO

Adopts the MLAPO fusion operator in attention
preprocessing stage of the MLA model.

false

SGLANG_USE_FIA_NZ

Reshapes KV Cache for FIA NZ format.
SGLANG_USE_FIA_NZ must be enabled with SGLANG_NPU_USE_MLAPO

false

SGLANG_NPU_USE_MULTI_STREAM

Enable dual-stream computation of shared experts
and routing experts in DeepSeek models.
Enable dual-stream computation in DeepSeek NSA Indexer.

false

SGLANG_NPU_DISABLE_ACL_FORMAT_WEIGHT

Disable cast model weight tensor to a specific NPU
ACL format.

false

SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK

The maximum number of dispatched tokens on each rank.

128

Used in DeepEP Ascend#

Environment Variable

Description

Default Value

DEEPEP_NORMAL_LONG_SEQ_PER_ROUND_TOKENS

Enable ant-moving function in dispatch stage. Indicates
the number of tokens transmitted per round on each rank.

8192

DEEPEP_NORMAL_LONG_SEQ_ROUND

Enable ant-moving function in dispatch stage. Indicates
the number of rounds transmitted on each rank.

1

DEEPEP_NORMAL_COMBINE_ENABLE_LONG_SEQ

Enable ant-moving function in combine stage.
The value 0 means disabled.

0

MOE_ENABLE_TOPK_NEG_ONE

Needs to be enabled when the expert ID to be processed by
DEEPEP contains -1.

0

DEEP_NORMAL_MODE_USE_INT8_QUANT

Quantizes x to int8 and returns (tensor, scales) in dispatch operator.

0

Others#

Environment Variable

Description

Default Value

TASK_QUEUE_ENABLE

Used to control the optimization level of the dispatch queue
about the task_queue operator. Detail

1

INF_NAN_MODE_ENABLE

Controls whether the chip uses saturation mode or INF_NAN mode. Detail

1

STREAMS_PER_DEVICE

Configures the maximum number of streams for the stream pool. Detail

32

PYTORCH_NPU_ALLOC_CONF

Controls the behavior of the cache allocator.
This variable changes memory usage and may cause performance fluctuations. Detail

ASCEND_MF_STORE_URL

The address of config store in MemFabric during PD separation,
which is generally set to the IP address of the P primary node
with an arbitrary port number.

ASCEND_LAUNCH_BLOCKING

Controls whether synchronous mode is enabled during operator execution. Detail

0

HCCL_OP_EXPANSION_MODE

Configures the expansion position for communication algorithm scheduling. Detail

HCCL_BUFFSIZE

Controls the size of the buffer area for shared data between two NPUs.
The unit is MB, and the value must be greater than or equal to 1. Detail

200

HCCL_SOCKET_IFNAME

Configures the name of the network card used by the Host
during HCCL initialization. Detail

GLOO_SOCKET_IFNAME

Configures the network interface name for GLOO communication.