Low Latency
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-32B | Atlas 800I A3 | 8 | PD Mixed | 18K+4K | 6ms | BF16 | Optimal Configuration |
High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-32B | Atlas 800I A2 | 2 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A3 | 2 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
Optimal Configuration
Qwen3-32B BF16 8P IN18K OUT4K 6ms
Model: Qwen3-32B Hardware: Atlas 800I A3 Cards: 8 Deploy Mode: PD Mixed Quantization: BF16 Dataset: 18K+4K TPOT: 6msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B W8A8 2P IN3K5 OUT1K5 50ms A2
Model: Qwen3-32B Hardware: Atlas 800I A2 Cards: 2 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B W8A8 2P IN3K5 OUT1K5 50ms
Model: Qwen3-32B Hardware: Atlas 800I A3 Cards: 2 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
