While the US tech ecosystem spent the last 48 hours discussing Anthropic's anti-ad campaign versus ChatGPT's new ad slots and dissecting Elon Musk's public xAI all-hands, the center of gravity in open-weight AI quietly shifted East.
In what is being dubbed "Chinese AI Week," three major labs released frontier-class models within a 24-hour window, effectively resetting the leaderboard for open weights and efficient inference.
Here is the breakdown of the three models you need to know about: GLM-5, MiniMax M2.5, and StepFun Flash 3.5.
Zhipu GLM-5: The 744B MoE Leviathan
Zhipu AI (often called the "OpenAI of China") dropped GLM-5, a massive 744B parameter Mixture-of-Experts (MoE) model with open weights.
This is not a toy model. GLM-5 represents a significant leap in open-weight capabilities, rivaling the best closed-source models from late 2025. The architecture allows it to activate only a fraction of its parameters per token, maintaining reasonable inference speeds despite its massive total size.
For developers and enterprise users, this means access to GPT-5 class reasoning without the API dependency—assuming you have the VRAM to run it (or the cluster to serve it).
StepFun Flash 3.5: Efficiency is the New King
While GLM-5 went big, StepFun went fast. The new StepFun Flash 3.5 is being hailed as the efficiency champion of 2026.
- Architecture: Sparse MoE (196B total, ~11B active)
- Performance: Claims #1 spot on MathArena, beating GPT-5.2.
- Speed: Extremely fast inference suitable for agentic workflows.
As Andrej Karpathy noted in his recent viral post, "Libraries are over, LLMs are the new compiler." StepFun Flash 3.5 is exactly the kind of "compiler" he's talking about—smart enough to write complex logic but fast enough to run in a loop without breaking the bank.
MiniMax M2.5: The Dark Horse
Rounding out the trio is MiniMax M2.5, released Wednesday evening. Known for their character-centric models and exceptional roleplay capabilities, MiniMax has pivoted hard into reasoning and coding with M2.5.
Early benchmarks suggest it trades blows with Llama 4 405B, particularly in creative writing and multi-turn instruction following.
The "Sputnik Moment" for Open Source?
The simultaneous release of these three models underscores a critical trend: the gap between closed and open models is closing faster than anticipated.
Just six months ago, the conventional wisdom was that US labs had an insurmountable lead in reasoning and coding. Today, a developer in Shanghai (or San Francisco) can download a model like GLM-5 or StepFun Flash 3.5 and run a local agent that rivals the best hosted APIs from OpenAI or Anthropic.
For Barista Labs clients and partners, this reinforces our core thesis: Model independence is a strategic asset. Relying solely on a single API provider is a liability when open-weight models are iterating this quickly.
We are currently benchmarking GLM-5 and StepFun Flash 3.5 for our internal coding agents. Expect a technical deep dive later this week.
