High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3.6-27B | Atlas 800I A3 | 1 | PD Mixed | 1024x1024 (30)+1024 | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-27B | Atlas 800I A3 | 1 | PD Mixed | 1080p_30+256 | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-27B | Atlas 800I A3 | 2 | PD Mixed | 64K+1K (90% prefix cache hit rate) | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-27B | Atlas 800I A3 | 1 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-27B | Atlas 800I A3 | 2 | PD Mixed | 128K+1K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-27B | Atlas 800I A3 | 2 | PD Mixed | 16K+1K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3.6-27B | Atlas 800I A3 | 2 | PD Mixed | 64K+1K | 50ms | W8A8 INT8 | Optimal Configuration |
Optimal Configuration
Qwen3.6-27B 1P IN1024X1024 30 OUT1024 50ms
Model: Qwen3.6-27B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 1024x1024 (30)+1024 Format: resolution (input tokens) + output tokens TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-27B 1P IN1080P 30 OUT256 50ms
Model: Qwen3.6-27B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 1080p_30+256 TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-27B 2P IN64K OUT1K PREFIX90 50ms
Model: Qwen3.6-27B Hardware: Atlas 800I A3 Cards: 2 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 64K+1K (90% prefix cache hit rate) TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on thegenerated-shared-prefix dataset with 90% cache hit (repeat_rate = 0.9):
--gsp-system-prompt-len 57600 = int(64000 * 0.9) is the shared prefix portion.
--gsp-question-len 6399 = int(64000 * (1 - 0.9)) is the unique per-request suffix.
--gsp-num-groups 1 keeps all requests in one prefix group for maximum cache reuse.
Command
Qwen3.6-27B W8A8 1P IN3K5 OUT1K5 50ms
Model: Qwen3.6-27B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-27B W8A8 2P IN128K OUT1K 50ms
Model: Qwen3.6-27B Hardware: Atlas 800I A3 Cards: 2 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 128K+1K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-27B W8A8 2P IN16K OUT1K 50ms
Model: Qwen3.6-27B Hardware: Atlas 800I A3 Cards: 2 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 16K+1K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3.6-27B W8A8 2P IN64K OUT1K 50ms
Model: Qwen3.6-27B Hardware: Atlas 800I A3 Cards: 2 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 64K+1K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
