Low Latency
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-8B | Atlas 800I A3 | 1 | PD Mixed | 3.5K+1.5K | 5ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-8B | Atlas 800I A3 | 1 | PD Mixed | 6K+1.5K | 11.79ms | W8A8 INT8 | Optimal Configuration |
High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-8B | Atlas 800I A3 | 1 | PD Mixed | 3.5K+1.5K | 37ms | W8A8 INT8 | Optimal Configuration |
Optimal Configuration
Qwen3-8B W8A8 1P IN3K5 OUT1K5 37ms
Model: Qwen3-8B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 37msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-8B W8A8 1P IN3K5 OUT1K5 5ms
Model: Qwen3-8B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 3.5K+1.5K TPOT: 5msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-8B W8A8 1P IN6K OUT1K5 BS16
Model: Qwen3-8B Hardware: Atlas 800I A3 Cards: 1 Deploy Mode: PD Mixed Quantization: W8A8 INT8 Dataset: 6K+1.5K TPOT: 11.79msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
