Support Features on Ascend NPU#
This section describes the basic functions and features supported by the Ascend NPU.If you encounter issues or have any questions, please open an issue.
If you want to know the meaning and usage of each parameter, click Server Arguments.
Model and tokenizer#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
|
√ |
√ |
|
{} |
Type: str |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
|
√ |
√ |
HTTP server#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
Quantization and data type#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: str |
× |
× |
|
|
|
√ |
√ |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
Memory and scheduling#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Optional[float] |
× |
× |
|
|
Type: float |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
Runtime options#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
bool flag (set to enable) |
× |
× |
|
|
Type: float |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
bool flag (set to enable) |
√ |
√ |
|
|
Optional[Callable] |
× |
× |
Logging#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
|
√ |
√ |
|
text |
text, json |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
List[str] |
× |
× |
|
|
List[float] |
× |
× |
|
|
List[float] |
× |
× |
|
|
List[float] |
× |
× |
|
|
bool flag |
× |
× |
|
|
List[str] |
× |
× |
|
|
List[str] |
× |
× |
|
|
Type: float |
× |
× |
|
|
Type: int |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
RequestMetricsExporter configuration#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
Data parallelism#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
bool flag |
√ |
√ |
Multi-node distributed serving#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
Model override args#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
√ |
√ |
|
|
Type: str |
√ |
√ |
LoRA#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Bool flag |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: List[str] / |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
× |
× |
Kernel Backends (Attention, Sampling, Grammar, GEMM)#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
× |
× |
|
|
|
× |
× |
|
|
|
× |
× |
|
|
bool flag |
× |
× |
Speculative decoding#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
|
× |
× |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: str |
√ |
√ |
|
|
|
√ |
√ |
Ngram speculative decoding#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
|
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
Expert parallelism#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
× |
× |
|
|
bool flag |
× |
× |
|
|
|
√ |
√ |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: float |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: int |
√ |
√ |
|
|
|
× |
× |
|
|
Type: str |
× |
× |
Mamba Cache#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
× |
× |
|
|
|
× |
× |
|
|
Type: float |
× |
× |
|
|
|
× |
× |
|
|
Type: int |
× |
× |
Hierarchical cache#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
√ |
√ |
|
|
Type: float |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
|
× |
× |
|
|
Type: str |
× |
× |
LMCache#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
Ktransformer server args#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
2 |
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
Double Sparsity#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
Offloading#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
√ |
√ |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
Args for multi-item scoring#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: int |
× |
× |
Optimization/debug options#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
List[int] |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: float |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: JSON |
× |
× |
|
|
[“eager”, “inductor”] |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
`` |
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: int |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
bool flag |
× |
× |
|
|
Type: int |
× |
× |
|
|
List[int] |
× |
× |
|
|
|
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
Dynamic batch tokenizer#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: float |
√ |
√ |
Debug tensor dumps#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
× |
× |
|
|
List[int] |
× |
× |
|
|
Type: str |
√ |
√ |
|
|
Type: str |
× |
× |
PD disaggregation#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
|
√ |
√ |
|
|
|
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: int |
√ |
√ |
|
|
Type: int |
√ |
√ |
Encode prefill disaggregation#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
|
|
bool flag |
× |
× |
|
|
|
× |
× |
|
|
List[str] |
× |
× |
Custom weight loader#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
List[str] |
× |
× |
|
|
bool flag |
√ |
√ |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
|
|
Type: JSON |
× |
× |
|
|
|
× |
× |
|
|
bool flag |
× |
× |
For PD-Multiplexing#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
|
|
Type: str |
× |
× |
|
|
Type: int |
× |
× |
For Multi-Modal#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
32 |
Type: int |
× |
× |
|
10.0 |
Type: float |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: JSON / Dict |
× |
× |
|
|
bool flag |
× |
× |
|
|
Type: JSON / Dict |
√ |
√ |
For checkpoint decryption#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
× |
× |
|
|
Type: str |
× |
× |
|
|
bool flag |
× |
× |
For deterministic inference#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
bool flag |
× |
× |
For registering hooks#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: JSON list |
× |
× |
Configuration file support#
Argument |
Defaults |
Options |
A2 |
A3 |
|---|---|---|---|---|
|
|
Type: str |
× |
× |