Chinese AI startup MiniMax dropped M2.5 on February 25, and the numbers are hard to ignore. The model scores 80.2% on SWE-Bench Verified -- matching Claude Opus 4.6 -- while running at a cost of $1 per hour when processing 100 tokens per second. Drop to 50 tokens per second and you are looking at $0.30 per hour.
That is not a typo. It is cheaper than a vending machine soda to run a frontier-grade coding model for an entire hour.
What M2.5 Actually Does
MiniMax trained M2.5 across more than 200,000 real-world coding environments spanning 13 languages -- Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby. The focus was not just fixing bugs. The model was trained to handle the full software development lifecycle: from initial system design through environment setup, feature iteration, and comprehensive testing.
A notable quirk emerged during training: M2.5 developed what MiniMax calls a "spec-writing tendency." Before writing code, the model actively decomposes projects from the perspective of a software architect, planning features, structure, and UI design. Whether this slows it down or produces better code in practice remains to be seen, but it is a different approach than the direct code generation most developers are used to.
The Speed Factor
MiniMax claims M2.5 completes SWE-Bench Verified evaluations 37% faster than its predecessor M2.1, while matching the speed of Claude Opus 4.6. The company is offering two API variants: standard M2.5 and M2.5-Highspeed, which deliver identical results but at different throughput rates.
The model also includes automatic caching with no configuration required -- a small but meaningful quality-of-life improvement for production deployments.
Benchmarks Tell Part of the Story
M2.5 achieved state-of-the-art results on several coding and agentic benchmarks:
- SWE-Bench Verified: 80.2%
- Multi-SWE-Bench: 51.3%
- BrowseComp (with context management): 76.3%
Multi-SWE-Bench is particularly worth watching -- it tests a model's ability to handle multiple interdependent code changes simultaneously, which is where many coding agents fall apart in production.
MiniMax also tested M2.5 against Opus 4.6 using different agent harnesses. On the Droid harness, M2.5 scored 79.7% versus Opus 4.6's 78.9%. On OpenCode, M2.5 hit 76.1% versus Opus 4.6's 75.9%. These are narrow margins, but they suggest M2.5 is competitive at the very top of the leaderboard.
Open Weights, Open Questions
Unlike many frontier models, M2.5's weights are fully open-sourced on HuggingFace. MiniMax recommends using vLLM or SGLang for self-hosting to achieve optimal performance.
This matters for businesses evaluating AI infrastructure decisions. If your use case involves sensitive codebases or requires air-gapped deployment, open weights eliminate the compliance headaches that come with sending proprietary code to third-party APIs.
The API pricing has not changed from previous MiniMax releases, meaning existing customers get the performance improvement at no additional cost.
Where This Fits for Small Businesses
If you are currently paying Anthropic $15-20 per million tokens for Opus-level performance, M2.5's pricing structure warrants a look. At $0.10 per million tokens according to third-party reports, the economics shift dramatically for high-volume applications.
The caveats:
- MiniMax is a Chinese company with headquarters in Shanghai, which may raise data sovereignty concerns for certain industries
- The model's training corpus and RLHF methods are less transparent than Anthropic's or OpenAI's
- Real-world latency and uptime under production load remain unproven compared to established providers
For teams already comfortable with open-source models and self-hosting, M2.5 offers a legitimate alternative to premium APIs. For teams needing vendor-managed reliability and support, the traditional providers likely still justify their premiums.
The broader significance: a model achieving Claude-level coding performance at 5-10% of the API cost normalizes what was frontier-grade intelligence just months ago. The race to the bottom on inference pricing is accelerating, and businesses building AI-native products should expect their cost advantage from proprietary models to compress rapidly.
Source: MiniMax M2.5 Announcement
