Qwen3 examples
Running Qwen3
Running Qwen3-32B on 1 x Atlas 800I A3.
Model weights could be found hereLaunch Server
Running Qwen3-32B on 1 x Atlas 800I A3 with Qwen3-32B-Eagle3.
Model weights could be found here Speculative model weights could be found hereLaunch Server with Eagle3
Running Qwen3-30B-A3B MOE on 1 x Atlas 800I A3.
Model weights could be found hereLaunch Server
Running Qwen3-235B-A22B-Instruct-2507 MOE on 1 x Atlas 800I A3.
Model weights could be found hereLaunch Server
Running Qwen3-235B-A22B-Instruct-2507 with 256K long sequence on 2 x Atlas 800I A3 without CP
This example uses PD disaggregation for long-sequence inference and keeps context parallel disabled. Set the shared environment variables on both nodes first:Command
Command
Command
Command
Running Qwen3-235B-A22B-Instruct-2507-W8A8 with Prefill Context Parallel (CP) on 2 x Atlas 800I A3
This example enables Prefill Context Parallel (--enable-prefill-context-parallel) to split the context across CP ranks during prefill, reducing per-device memory pressure and improving TTFT for long sequences. PD disaggregation is required.
ConstraintsPrefill node
- Prefill side must set
--max-running-requests 1(PCP only supports batch_size=1)--attn-cp-sizemust evenly divide--tp-size; each CP rank occupiestp_size / cp_sizeNPUs
<PREFILL_HOST_IP>:
Launch Server
| Parameter | Value | Description |
|---|---|---|
--enable-prefill-context-parallel | flag | Enable PCP feature |
--attn-cp-size | 2 | Split context across 2 CP ranks (each rank handles half the sequence) |
--moe-dp-size | 2 | MoE DP size, should match --attn-cp-size |
--max-running-requests | 1 | Required by PCP (batch_size=1 constraint) |
<DECODE_HOST_IP>):
Launch Server
Note:ASCEND_MF_STORE_URLon both nodes must point to the same KV store (typically the Prefill node IP).ASCEND_USE_FIA=Trueenables fast interconnect aggregation for KV transfer. PCP is a Prefill-only feature; the Decode side needs no CP-related flags.
Running Qwen3-VL-8B-Instruct on 1 x Atlas 800I A3.
Model weights could be found hereLaunch Server
Testing the Service
Once the server printsThe server is fired up and ready to roll! in the logs, it is ready to accept requests. For testing examples (Health Check, Generate, Chat Completions, and port usage guidance), see Testing the Service.