Industry Insights

MiniMax M2.5: The $1-an-Hour AI That Outcodes Claude

Chinese AI startup MiniMax just released M2.5, a coding-focused model that matches Claude Opus 4.6 on benchmarks while costing $1 per hour to run continuously. It is fully open-source and already shaking up the API pricing landscape.

Sean McLellan

Lead Architect & Founder

February 25, 20264 min read

Chinese AI startup MiniMax dropped M2.5 on February 25, and the numbers are hard to ignore. The model scores 80.2% on SWE-Bench Verified -- matching Claude Opus 4.6 -- while running at a cost of $1 per hour when processing 100 tokens per second. Drop to 50 tokens per second and you are looking at $0.30 per hour.

That is not a typo. It is cheaper than a vending machine soda to run a frontier-grade coding model for an entire hour.

What M2.5 Actually Does

MiniMax trained M2.5 across more than 200,000 real-world coding environments spanning 13 languages -- Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby. The focus was not just fixing bugs. The model was trained to handle the full software development lifecycle: from initial system design through environment setup, feature iteration, and comprehensive testing.

A notable quirk emerged during training: M2.5 developed what MiniMax calls a "spec-writing tendency." Before writing code, the model actively decomposes projects from the perspective of a software architect, planning features, structure, and UI design. Whether this slows it down or produces better code in practice remains to be seen, but it is a different approach than the direct code generation most developers are used to.

The Speed Factor

MiniMax claims M2.5 completes SWE-Bench Verified evaluations 37% faster than its predecessor M2.1, while matching the speed of Claude Opus 4.6. The company is offering two API variants: standard M2.5 and M2.5-Highspeed, which deliver identical results but at different throughput rates.

The model also includes automatic caching with no configuration required -- a small but meaningful quality-of-life improvement for production deployments.

Benchmarks Tell Part of the Story

M2.5 achieved state-of-the-art results on several coding and agentic benchmarks:

SWE-Bench Verified: 80.2%
Multi-SWE-Bench: 51.3%
BrowseComp (with context management): 76.3%

Multi-SWE-Bench is particularly worth watching -- it tests a model's ability to handle multiple interdependent code changes simultaneously, which is where many coding agents fall apart in production.

MiniMax also tested M2.5 against Opus 4.6 using different agent harnesses. On the Droid harness, M2.5 scored 79.7% versus Opus 4.6's 78.9%. On OpenCode, M2.5 hit 76.1% versus Opus 4.6's 75.9%. These are narrow margins, but they suggest M2.5 is competitive at the very top of the leaderboard.

Open Weights, Open Questions

Unlike many frontier models, M2.5's weights are fully open-sourced on HuggingFace. MiniMax recommends using vLLM or SGLang for self-hosting to achieve optimal performance.

This matters for businesses evaluating AI infrastructure decisions. If your use case involves sensitive codebases or requires air-gapped deployment, open weights eliminate the compliance headaches that come with sending proprietary code to third-party APIs.

The API pricing has not changed from previous MiniMax releases, meaning existing customers get the performance improvement at no additional cost.

Where This Fits for Small Businesses

If you are currently paying Anthropic $15-20 per million tokens for Opus-level performance, M2.5's pricing structure warrants a look. At $0.10 per million tokens according to third-party reports, the economics shift dramatically for high-volume applications.

The caveats:

MiniMax is a Chinese company with headquarters in Shanghai, which may raise data sovereignty concerns for certain industries
The model's training corpus and RLHF methods are less transparent than Anthropic's or OpenAI's
Real-world latency and uptime under production load remain unproven compared to established providers

For teams already comfortable with open-source models and self-hosting, M2.5 offers a legitimate alternative to premium APIs. For teams needing vendor-managed reliability and support, the traditional providers likely still justify their premiums.

The broader significance: a model achieving Claude-level coding performance at 5-10% of the API cost normalizes what was frontier-grade intelligence just months ago. The race to the bottom on inference pricing is accelerating, and businesses building AI-native products should expect their cost advantage from proprietary models to compress rapidly.

Source: MiniMax M2.5 Announcement

AI Pilot Readiness Checklist

Turn the idea into a pilot you can defend.

AI agent articles are easy to bookmark and hard to operationalize. Use the readiness questions as a shared way to decide whether a workflow is specific enough, safe enough, and measurable enough to pilot. If they surface a strong candidate, BaristaLabs can review it with you and help shape a first version that fits your systems, approval process, and risk tolerance.

Turn this into a pilot plan Talk through a pilot candidate with BaristaLabs

Please do not submit PHI, customer records, credentials, or confidential workflow exports.

Practical AI Workflow Notes

Want more practical AI operations ideas?

Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.

Share this post

Share on X Share on LinkedIn Share on Bluesky

China's Agentic AI Arms Race: ByteDance, Alibaba, and DeepSeek Are All Betting on AI Agents

February 16, 2026

The middle market is becoming an AI product category

July 10, 2026

AWS will send the AI pod. Ask what stays when it leaves.

July 2, 2026