1. Model Introduction
MiniMax-M2 is a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. This generation delivers comprehensive upgrades across the board:- Superior Intelligence: MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use in Artificial Analysis. Its composite score ranks #1 among open-source models globally.
- Advanced Coding: Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages.
- Agent Performance: MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps.
- Efficient Design: With 10 billion activated parameters (230 billion in total), MiniMax-M2 delivers lower latency, lower cost, and higher throughput for interactive agents and batched sampling—perfectly aligned with the shift toward highly deployable models that still shine on coding and agentic tasks.
2. SGLang Installation
SGLang offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang installation guide for installation instructions. The AMD environment is currently available in SGLang via Docker image install.2.1 AMD Docker
2.1.1 Launch docker
Command
Command
2.1.2 Make modifications inside the docker
Command
2.1.3 Fix torch compile
Comment out the following line: @torch.compile(dynamic=True, backend=get_compiler_backend()) in /sgl-workspace/sglang/python/sglang/srt/models/minimax_m2.pyCommand
3. Model Deployment
This section provides a progressive guide from quick deployment to performance optimization, suitable for users at different levels.3.1 Basic Configuration
Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, model variant, deployment strategy, and thinking capabilities.4. Model Invocation
4.1 Basic Usage
For basic API usage and request examples, please refer to:4.2 Advanced Usage
4.2.1 Reasoning Parser
Server Command:Command
Example
Output
4.2.2 Tool Calling
Server Command:Command
Example
Output
5. Benchmark
5.1 Speed Benchmark
Test Environment:- Hardware: AMD MI300X GPU(4x)
- Model: MiniMax-M2
- Tensor Parallelism: 4
- sglang version: 0.5.7
Command
5.1.1 Low Concurrency (Latency-Optimized)
- Benchmark Command:
Command
- Test Results:
Output
5.1.2 Medium Concurrency (Balanced)
- Benchmark Command:
Command
- Test Results:
Output
5.1.3 High Concurrency (Throughput-Optimized)
- Benchmark Command:
Command
- Test Results:
Output
5.2 Accuracy Benchmark
5.2.1 GSM8K Benchmark
- Server Command:
Command
- Benchmark Command:
Command
- Result:
- MiniMax-M2
Output
