Low Latency
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 254K+1K | 16.1ms | W8A8 INT8 | Optimal Configuration |
High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 1024x1024 (30)+1024 | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 1080p_30+256 | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 128K+1K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 128K+1K (90% prefix cache hit rate) | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 64K+1K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-35B-A3B | Atlas 800I A3 | 1 | PD Mixed | 64K+1K (90% prefix cache hit rate) | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-35B-A3B | Atlas 800I A3 | 2 | PD Mixed | 984K+1K | 40.91ms | W8A8 INT8 | Optimal Configuration |
Optimal Configuration
Qwen3.6-35B-A3B 1P IN1024X1024 30 OUT1024 50ms
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 1024x1024 (30)+1024 Format: resolution (input tokens) + output tokens TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-35B-A3B 1P IN1080P 30 OUT256 50ms
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 1080p_30+256 TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-35B-A3B 1P IN128K OUT1K 50ms
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 128K+1K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-35B-A3B 1P IN128K OUT1K PREFIX90 50ms
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 128K+1K (90% prefix cache hit rate) TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on thegenerated-shared-prefix dataset with 90% cache hit (repeat_rate = 0.9):
--gsp-system-prompt-len 57600 = int(64000 * 0.9) is the shared prefix portion.
--gsp-question-len 6399 = int(64000 * (1 - 0.9)) is the unique per-request suffix.
--gsp-num-groups 1 keeps all requests in one prefix group for maximum cache reuse.
Command
Qwen3.6-35B-A3B 1P IN254K OUT1K
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 254K+1K TPOT: 16.1msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-35B-A3B 1P IN3K5 OUT1K5 50ms
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-35B-A3B 1P IN64K OUT1K 50ms
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 64K+1K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-35B-A3B 1P IN64K OUT1K PREFIX90 50ms
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 64K+1K (90% prefix cache hit rate) TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on thegenerated-shared-prefix dataset with 90% cache hit (repeat_rate = 0.9):
--gsp-system-prompt-len 58982 = int(65536 * 0.9) is the shared prefix portion.
--gsp-question-len 6553 = int(65536 * (1 - 0.9)) is the unique per-request suffix.
--gsp-num-groups 1 keeps all requests in one prefix group for maximum cache reuse.
Command
Qwen3.6-35B-A3B 2P IN984K OUT1K
Model: Qwen3.6-35B-A3B Hardware: Atlas 800I A3 Cards: 2 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 984K+1K TPOT: 40.91msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
