Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sglang.io/llms.txt

Use this file to discover all available pages before exploring further.

SGLang supports various environment variables related to Ascend NPU that can be used to configure its runtime behavior. This document provides a list of commonly used environment variables and aims to stay updated over time.

Directly Used in SGLang

Environment VariableDescriptionDefault Value
SGLANG_NPU_USE_MLAPOAdopts the MLAPO fusion operator in attention <br/> preprocessing stage of the MLA model.false
SGLANG_USE_FIA_NZReshapes KV Cache for FIA NZ format.<br/> SGLANG_USE_FIA_NZ must be enabled with SGLANG_NPU_USE_MLAPOfalse
SGLANG_NPU_USE_MULTI_STREAMEnable dual-stream computation of shared experts <br/> and routing experts in DeepSeek models.<br/> Enable dual-stream computation in DeepSeek NSA Indexer.false
SGLANG_NPU_DISABLE_ACL_FORMAT_WEIGHTDisable cast model weight tensor to a specific NPU <br/> ACL format.false
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANKThe maximum number of dispatched tokens on each rank.128

Used in DeepEP Ascend

Environment VariableDescriptionDefault Value
DEEPEP_NORMAL_LONG_SEQ_PER_ROUND_TOKENSEnable ant-moving function in dispatch stage. Indicates <br/> the number of tokens transmitted per round on each rank.8192
DEEPEP_NORMAL_LONG_SEQ_ROUNDEnable ant-moving function in dispatch stage. Indicates <br/> the number of rounds transmitted on each rank.1
DEEPEP_NORMAL_COMBINE_ENABLE_LONG_SEQEnable ant-moving function in combine stage. <br/> The value 0 means disabled.0
MOE_ENABLE_TOPK_NEG_ONENeeds to be enabled when the expert ID to be processed by <br/> DEEPEP contains -1.0
DEEP_NORMAL_MODE_USE_INT8_QUANTQuantizes x to int8 and returns (tensor, scales) in dispatch operator.0

Others

Environment VariableDescriptionDefault Value
TASK_QUEUE_ENABLEUsed to control the optimization level of the dispatch queue<br/> about the task_queue operator. Detail1
INF_NAN_MODE_ENABLEControls whether the chip uses saturation mode or INF_NAN mode. Detail1
STREAMS_PER_DEVICEConfigures the maximum number of streams for the stream pool. Detail32
PYTORCH_NPU_ALLOC_CONFControls the behavior of the cache allocator. <br/>This variable changes memory usage and may cause performance fluctuations. Detail
ASCEND_MF_STORE_URLThe address of config store in MemFabric during PD separation, <br/>which is generally set to the IP address of the P primary node<br/> with an arbitrary port number.
ASCEND_LAUNCH_BLOCKINGControls whether synchronous mode is enabled during operator execution. Detail0
HCCL_OP_EXPANSION_MODEConfigures the expansion position for communication algorithm scheduling. Detail
HCCL_BUFFSIZEControls the size of the buffer area for shared data between two NPUs. <br/>The unit is MB, and the value must be greater than or equal to 1. Detail200
HCCL_SOCKET_IFNAMEConfigures the name of the network card used by the Host <br/>during HCCL initialization. Detail
GLOO_SOCKET_IFNAMEConfigures the network interface name for GLOO communication.