> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Environment Variables

SGLang supports various environment variables related to Ascend NPU that can be used to configure its runtime behavior.
This document provides a list of commonly used environment variables and aims to stay updated over time.

## Directly Used in SGLang

<table>
  <thead>
    <tr>
      <th>Environment Variable</th>
      <th>Description</th>
      <th>Default Value</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td><code>SGLANG\_NPU\_USE\_MLAPO</code></td>
      <td>Adopts the <code>MLAPO</code> fusion operator in attention \<br/> preprocessing stage of the MLA model.</td>
      <td><code>false</code></td>
    </tr>

    <tr>
      <td><code>SGLANG\_USE\_FIA\_NZ</code></td>
      <td>Reshapes KV Cache for FIA NZ format.\<br/> <code>SGLANG\_USE\_FIA\_NZ</code> must be enabled with <code>SGLANG\_NPU\_USE\_MLAPO</code></td>
      <td><code>false</code></td>
    </tr>

    <tr>
      <td><code>SGLANG\_NPU\_USE\_MULTI\_STREAM</code></td>
      <td>Enable dual-stream computation of shared experts \<br/> and routing experts in DeepSeek models.\<br/> Enable dual-stream computation in DeepSeek NSA Indexer.</td>
      <td><code>false</code></td>
    </tr>

    <tr>
      <td><code>SGLANG\_NPU\_DISABLE\_ACL\_FORMAT\_WEIGHT</code></td>
      <td>Disable cast model weight tensor to a specific NPU \<br/> ACL format.</td>
      <td><code>false</code></td>
    </tr>

    <tr>
      <td><code>SGLANG\_DEEPEP\_NUM\_MAX\_DISPATCH\_TOKENS\_PER\_RANK</code></td>
      <td>The maximum number of dispatched tokens on each rank.</td>
      <td><code>128</code></td>
    </tr>
  </tbody>
</table>

## Used in DeepEP Ascend

<table>
  <thead>
    <tr>
      <th>Environment Variable</th>
      <th>Description</th>
      <th>Default Value</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td><code>DEEPEP\_NORMAL\_LONG\_SEQ\_PER\_ROUND\_TOKENS</code></td>
      <td>Enable ant-moving function in dispatch stage. Indicates \<br/> the number of tokens transmitted per round on each rank.</td>
      <td><code>8192</code></td>
    </tr>

    <tr>
      <td><code>DEEPEP\_NORMAL\_LONG\_SEQ\_ROUND</code></td>
      <td>Enable ant-moving function in dispatch stage. Indicates \<br/> the number of rounds transmitted on each rank.</td>
      <td><code>1</code></td>
    </tr>

    <tr>
      <td><code>DEEPEP\_NORMAL\_COMBINE\_ENABLE\_LONG\_SEQ</code></td>
      <td>Enable ant-moving function in combine stage. \<br/> The value <code>0</code> means disabled.</td>
      <td><code>0</code></td>
    </tr>

    <tr>
      <td><code>MOE\_ENABLE\_TOPK\_NEG\_ONE</code></td>
      <td>Needs to be enabled when the expert ID to be processed by \<br/> DEEPEP contains -1.</td>
      <td><code>0</code></td>
    </tr>

    <tr>
      <td><code>DEEP\_NORMAL\_MODE\_USE\_INT8\_QUANT</code></td>
      <td>Quantizes x to int8 and returns (tensor, scales) in dispatch operator.</td>
      <td><code>0</code></td>
    </tr>
  </tbody>
</table>

## Others

<table>
  <thead>
    <tr>
      <th>Environment Variable</th>
      <th>Description</th>
      <th>Default Value</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td><code>TASK\_QUEUE\_ENABLE</code></td>
      <td>Used to control the optimization level of the dispatch queue\<br/> about the task\_queue operator. <a href="https://www.hiascend.com/document/detail/zh/Pytorch/730/comref/Envvariables/docs/zh/environment_variable_reference/TASK_QUEUE_ENABLE.md">Detail</a></td>
      <td><code>1</code></td>
    </tr>

    <tr>
      <td><code>INF\_NAN\_MODE\_ENABLE</code></td>
      <td>Controls whether the chip uses saturation mode or INF\_NAN mode. <a href="https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha001/apiref/envref/envref_07_0056.html">Detail</a></td>
      <td><code>1</code></td>
    </tr>

    <tr>
      <td><code>STREAMS\_PER\_DEVICE</code></td>
      <td>Configures the maximum number of streams for the stream pool. <a href="https://www.hiascend.com/document/detail/zh/Pytorch/720/comref/Envvariables/Envir_041.html">Detail</a></td>
      <td><code>32</code></td>
    </tr>

    <tr>
      <td><code>PYTORCH\_NPU\_ALLOC\_CONF</code></td>
      <td>Controls the behavior of the cache allocator. \<br/>This variable changes memory usage and may cause performance fluctuations. <a href="https://www.hiascend.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html">Detail</a></td>

      <td />
    </tr>

    <tr>
      <td><code>ASCEND\_MF\_STORE\_URL</code></td>
      <td>The address of config store in MemFabric during PD separation, \<br/>which is generally set to the IP address of the P primary node\<br/> with an arbitrary port number.</td>

      <td />
    </tr>

    <tr>
      <td><code>ASCEND\_LAUNCH\_BLOCKING</code></td>
      <td>Controls whether synchronous mode is enabled during operator execution. <a href="https://www.hiascend.com/document/detail/zh/Pytorch/710/comref/Envvariables/Envir_006.html">Detail</a></td>
      <td><code>0</code></td>
    </tr>

    <tr>
      <td><code>HCCL\_OP\_EXPANSION\_MODE</code></td>
      <td>Configures the expansion position for communication algorithm scheduling. <a href="https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha001/apiref/envref/envref_07_0094.html">Detail</a></td>

      <td />
    </tr>

    <tr>
      <td><code>HCCL\_BUFFSIZE</code></td>
      <td>Controls the size of the buffer area for shared data between two NPUs. \<br/>The unit is MB, and the value must be greater than or equal to 1. <a href="https://www.hiascend.com/document/detail/zh/Pytorch/60RC3/ptmoddevg/trainingmigrguide/performance_tuning_0047.html">Detail</a></td>
      <td><code>200</code></td>
    </tr>

    <tr>
      <td><code>HCCL\_SOCKET\_IFNAME</code></td>
      <td>Configures the name of the network card used by the Host \<br/>during HCCL initialization. <a href="https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/apiref/envvar/envref_07_0075.html">Detail</a></td>

      <td />
    </tr>

    <tr>
      <td><code>GLOO\_SOCKET\_IFNAME</code></td>
      <td>Configures the network interface name for GLOO communication.</td>

      <td />
    </tr>
  </tbody>
</table>
