1. Model Introduction
MiniMax-M2.7 is MiniMax’s first model deeply participating in its own evolution. Built for real-world productivity, M2.7 excels at building complex agent harnesses and completing highly elaborate productivity tasks, leveraging Agent Teams, complex Skills, and dynamic tool search. Key highlights:- Model Self-Evolution: During development, M2.7 updates its own memory, builds complex skills for RL experiments, and improves its own learning process. An internal version autonomously optimized a programming scaffold over 100+ rounds, achieving a 30% performance improvement. On MLE Bench Lite, M2.7 achieved a 66.6% medal rate.
- Professional Software Engineering: Delivers outstanding real-world programming capabilities. On SWE-Pro, M2.7 achieved 56.22%, with strong results on SWE Multilingual (76.5) and Multi SWE Bench (52.7). On Terminal Bench 2 (57.0%) and NL2Repo (39.8%), M2.7 demonstrates deep understanding of complex engineering systems.
- Professional Work: Achieved an ELO score of 1495 on GDPval-AA (highest among open-source models). On Toolathon, M2.7 reached 46.3% accuracy (global top tier).
- Native Agent Teams: Supports multi-agent collaboration with stable role identity and autonomous decision-making.
2. SGLang Installation
SGLang offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang installation guide for installation instructions. Docker Images by Hardware Platform:| Hardware Platform | Docker Image |
|---|---|
| NVIDIA A100 / H100 / H200 / B200 | lmsysorg/sglang:v0.5.10.post1 |
| NVIDIA B300 / GB300 | lmsysorg/sglang:v0.5.10.post1-cu130 |
| AMD MI300X / MI325X | lmsysorg/sglang:v0.5.10.post1-rocm720-mi30x |
| AMD MI355X | lmsysorg/sglang:v0.5.10.post1-rocm720-mi35x |
3. Model Deployment
This section provides deployment configurations optimized for different hardware platforms and use cases.3.1 Basic Configuration
Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, deployment strategy, and feature capabilities.3.2 Configuration Tips
Key Parameters:| Parameter | Description | Recommended Value |
|---|---|---|
--tool-call-parser | Tool call parser for function calling support | minimax-m2 |
--reasoning-parser | Reasoning parser for thinking mode | minimax-append-think |
--trust-remote-code | Required for MiniMax model loading | Always enabled |
--mem-fraction-static | Static memory fraction for KV cache | 0.85 |
--tp | Tensor parallelism size | 2 / 4 / 8 depending on hardware |
--ep | Expert parallelism size | 8 (NVIDIA 8-GPU) or EP=TP (AMD) |
--kv-cache-dtype | KV cache data type (AMD only) | fp8_e4m3 |
--attention-backend | Attention backend (AMD only) | triton |
- 4-GPU deployment: Requires 4× high-memory GPUs (e.g., H200, B200, A100, H100) with TP=4
- 8-GPU deployment: Requires 8× GPUs (e.g., H200, B200, A100, H100) with TP=8 and EP=8
- 2-GPU deployment: GB300 (275GB per die) can host the model with TP=2
- 4-GPU deployment: Maximum single-node TP for GB300, recommended for higher throughput
- 2-GPU deployment: Requires 2× high-memory GPUs (e.g., MI300X, MI325X, MI355X) with TP=2, EP=2
- 4-GPU deployment: Requires 4× GPUs (e.g., MI300X, MI325X, MI355X) with TP=4, EP=4
- 8-GPU deployment: Requires 8× GPUs (e.g., MI300X, MI325X, MI355X) with TP=8, EP=8
4. Model Invocation
4.1 Basic Usage
For basic API usage and request examples, please refer to: Deployment Command:Command
Command
Example
Output
4.2 Advanced Usage
4.2.1 Reasoning Parser
MiniMax-M2.7 supports Thinking mode. Enable the reasoning parser during deployment to separate the thinking and the content sections:Command
minimax-append-think, the thinking content is wrapped in <think>...</think> tags within the content field. You can parse these tags on the client side to separate the thinking and content sections:
Example
Output
minimax-append-think reasoning parser embeds the thinking process in <think>...</think> tags within the content field. The code above parses these tags in real-time to display thinking and content separately.
4.2.2 Tool Calling
MiniMax-M2.7 supports tool calling capabilities. Enable the tool call parser:Command
Example
Output
Example
Output
5. Benchmark
This section uses industry-standard configurations for comparable benchmark results. Test Environment:- Hardware: 2× NVIDIA GB300 (275GB per die)
- Docker Image:
lmsysorg/sglang:v0.5.10.post1-cu130 - Model: MiniMax-M2.7 (FP8)
- Tensor Parallelism: 2
- SGLang version: 0.5.10.post1
5.1 Accuracy Benchmark
Evaluation Tool: NVIDIA NeMo-Skills Evaluation Settings: temperature=0.6, top_p=0.95, 8 seeds, max_tokens=120,000,parse_reasoning=True
5.1.1 GPQA Diamond
- Dataset: GPQA Diamond (198 questions)
- Prompt:
eval/aai/mcq-4choices(4-choice multiple choice, matching Artificial Analysis methodology) - Evaluation command:
Command
- Test Results:
| Evaluation Mode | Accuracy | No Answer |
|---|---|---|
| pass@1 (avg-of-8) | 84.91% | 3.54% |
| majority@8 | 88.89% | 0.00% |
| pass@8 | 96.46% | 0.00% |
5.1.2 AIME 2025
- Dataset: AIME 2025 (30 problems)
- Prompt:
generic/math(boxed answer format) - Evaluation command:
Command
- Test Results:
| Evaluation Mode | Accuracy | No Answer |
|---|---|---|
| pass@1 (avg-of-8) | 92.50% ± 5.56% | 2.92% |
| majority@8 | 97.08% | 0.00% |
| pass@8 | 100.00% | 0.00% |
5.1.3 MMLU-Pro
- Dataset: MMLU-Pro (12,032 questions, 10-choice)
- Prompt:
eval/aai/mcq-10choices(10-choice multiple choice) - Evaluation command:
Command
- Test Results:
| Evaluation Mode | Accuracy | No Answer |
|---|---|---|
| pass@1 (greedy) | 69.41% | 18.75% |
Note: The high no-answer rate is due to the 32K token limit being insufficient for M2.7’s extended thinking on some questions. A rerun with 120K tokens is expected to improve accuracy significantly.
5.1.4 GSM8K Benchmark
- Benchmark Method: 8-shot Chain-of-Thought, evaluated via OpenAI-compatible API
- Test Results:
Output
5.2 Speed Benchmark
5.2.1 Low Concurrency
- Benchmark Command:
Command
- Test Results:
Output
5.2.2 High Concurrency
- Benchmark Command:
Command
- Test Results:
Output
