Low Latency
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-235B-A22B | Atlas 800I A3 | 8 | PD Mixed | 11K+1.5K | 8ms | BF16 | Optimal Configuration |
High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-235B-A22B | Atlas 800I A3 | 8 | PD Mixed | 3.5K+1.5K | 50.1ms | W8A8 INT8 | Optimal Configuration |
Optimal Configuration
Qwen3-235B-A22B BF16 8P IN11K OUT1K5 8ms
Model: Qwen3-235B-A22B Hardware: Atlas 800I A3 Cards: 8 Deploy Mode: PD Mixed Quantization: BF16 Dataset: 11K+1.5K TPOT: 8msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-235B-A22B W8A8 8P IN3K5 OUT1K5 50.1ms
Model: Qwen3-235B-A22B Hardware: Atlas 800I A3 Cards: 8 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 50.1msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
