> ## Documentation Index > Fetch the complete documentation index at: https://docs.sglang.io/llms.txt > Use this file to discover all available pages before exploring further. # Environment Variables SGLang supports various environment variables related to Ascend NPU that can be used to configure its runtime behavior. This document provides a list of commonly used environment variables and aims to stay updated over time. ## Directly Used in SGLang

Environment Variable	Description	Default Value
`SGLANG\_NPU\_USE\_MLAPO`	Adopts the `MLAPO` fusion operator in attention preprocessing stage of the MLA model.	`false`
`SGLANG\_USE\_FIA\_NZ`	Reshapes KV Cache for FIA NZ format. `SGLANG\_USE\_FIA\_NZ` must be enabled with `SGLANG\_NPU\_USE\_MLAPO`	`false`
`SGLANG\_NPU\_USE\_MULTI\_STREAM`	Enable dual-stream computation of shared experts and routing experts in DeepSeek models. Enable dual-stream computation in DeepSeek DSA Indexer.	`false`
`SGLANG\_NPU\_DISABLE\_ACL\_FORMAT\_WEIGHT`	Disable cast model weight tensor to a specific NPU ACL format.	`false`
`SGLANG\_DEEPEP\_NUM\_MAX\_DISPATCH\_TOKENS\_PER\_RANK`	The maximum number of dispatched tokens on each rank.	`128`

## Used in DeepEP Ascend

Environment Variable	Description	Default Value
`DEEPEP\_NORMAL\_LONG\_SEQ\_PER\_ROUND\_TOKENS`	Enable long-sequence token pipelining in dispatch stage. Indicates the number of tokens transmitted per round on each rank.	`8192`
`DEEPEP\_NORMAL\_LONG\_SEQ\_ROUND`	Enable long-sequence token pipelining in dispatch stage. Indicates the number of rounds transmitted on each rank.	`1`
`DEEPEP\_NORMAL\_COMBINE\_ENABLE\_LONG\_SEQ`	Enable long-sequence token pipelining in combine stage. The value `0` means disabled.	`0`
`MOE\_ENABLE\_TOPK\_NEG\_ONE`	Needs to be enabled when the expert ID to be processed by DEEPEP contains -1.	`0`
`DEEP\_NORMAL\_MODE\_USE\_INT8\_QUANT`	Deprecated — will be removed in a future release. When set to `1`, quantizes intermediate activations to INT8 in the DeepEP dispatch operator during normal mode, reducing communication volume for W8A8-quantized MoE models. This variable will become a no-op; the quantization behavior will be inferred automatically.	`0`

## Others

Environment Variable	Description	Default Value
`TASK\_QUEUE\_ENABLE`	Used to control the optimization level of the dispatch queue about the task\_queue operator. Detail	`1`
`INF\_NAN\_MODE\_ENABLE`	Controls whether the chip uses saturation mode or INF\_NAN mode. Detail	`1`
`STREAMS\_PER\_DEVICE`	Configures the maximum number of streams for the stream pool. Detail	`32`
`PYTORCH\_NPU\_ALLOC\_CONF`	Controls the behavior of the cache allocator. This variable changes memory usage and may cause performance fluctuations. Detail
`ASCEND\_MF\_STORE\_URL`	The address of config store in MemFabric during PD separation, which is generally set to the IP address of the P primary node with an arbitrary port number.
`ASCEND\_LAUNCH\_BLOCKING`	Controls whether synchronous mode is enabled during operator execution. Detail	`0`
`HCCL\_OP\_EXPANSION\_MODE`	Configures the expansion position for communication algorithm scheduling. Detail
`HCCL\_BUFFSIZE`	Controls the size of the buffer area for shared data between two NPUs. The unit is MB, and the value must be greater than or equal to 1. Detail	`200`
`HCCL\_SOCKET\_IFNAME`	Configures the name of the network card used by the Host during HCCL initialization. Detail
`GLOO\_SOCKET\_IFNAME`	Configures the network interface name for GLOO communication.