Low Latency
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-30B-A3B | Atlas 800I A3 | 1 | PD Mixed | 3.5K+1.5K | 10ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-30B-A3B | Atlas 800I A3 | 1 | PD Mixed | 6K+1.5K | 10.25ms | W8A8 INT8 | Optimal Configuration |
High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-30B-A3B | Atlas 800I A3 | 1 | PD Mixed | 1K+100 | 10000ms | BF16 | Optimal Configuration |
| Qwen3-30B-A3B | Atlas 800I A3 | 1 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
Optimal Configuration
Qwen3-30B-A3B BF16 1P IN1K OUT100
Model: Qwen3-30B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: BF16 Dataset: 1K+100 TPOT: 10000msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-30B-A3B W8A8 1P IN3K5 OUT1K5 10ms
Model: Qwen3-30B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 10msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-30B-A3B W8A8 1P IN3K5 OUT1K5 50ms
Model: Qwen3-30B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-30B-A3B W8A8 1P IN6K OUT1K5 BS16
Model: Qwen3-30B-A3B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 6K+1.5K TPOT: 10.25msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
