If you run AI workflows in a small business, this is the number to watch: tool selection in 385ms across 67 tools and 13 MCP servers on a local machine.
Liquid AI shared that result in its primary X announcement for LFM2-24B-A2B, alongside a reported 14.5GB memory footprint on an M4 Max for this on-device tool-calling setup.
That is not a benchmark for chat quality alone. It is an operations signal for local agent orchestration.
What Was Announced
From Liquid AI’s public materials:
- LFM2-24B-A2B is presented as a sparse model in the LFM2 family with ~24B total parameters and roughly ~2B active parameters per token (Liquid’s model card and blog describe this as A2B behavior).
- The X post reports a 385ms tool-selection latency in a setup with 67 tools across 13 MCP servers.
- The same post reports a 14.5GB memory footprint on Apple M4 Max for the demonstrated setup.
For context, Liquid AI’s model release pages position LFM2-24B-A2B as a scaling step for its hybrid architecture while keeping inference efficient enough for local deployment.
Why This Matters for SMB Teams
Most SMB agent deployments fail on reliability and cost control, not lack of model options.
This announcement matters because it suggests three practical shifts:
-
Laptop-class orchestration becomes plausible
If tool routing can stay sub-second on commodity high-end laptops, teams can run more workflows without shipping every tool decision to cloud APIs.
-
Lower privacy and compliance friction
Local inference for tool selection reduces data movement. That helps firms in legal, healthcare-adjacent, and financial workflows where cloud data paths trigger extra controls.
-
More predictable operating costs
When core routing logic runs on-device, variable inference spend drops for routine tasks. Cloud capacity can be reserved for heavy generation or escalation paths.
A Practical Test Plan (14 Days)
If you are an SMB with existing MCP-enabled tooling, run a short pilot:
- Week 1: Move one bounded workflow (for example, inbound lead qualification + CRM update) to local-first tool routing.
- Week 2: Measure median tool-selection latency, error rate, and cloud-token spend compared with your current stack.
Success criteria should be explicit:
- p50 and p95 routing latency,
- percentage of tasks completed without cloud fallback,
- net cost per completed workflow.
If those numbers do not improve, do not force it.
Bottom Line
Liquid AI’s LFM2-24B-A2B announcement is one of the stronger recent signals that on-device agent infrastructure is becoming operationally credible, not just demo-friendly.
For SMBs, the opportunity is straightforward: treat local tool-calling as a cost, speed, and privacy lever, then validate with hard workflow metrics before broader rollout.
Sources
- Primary source (X): https://x.com/liquidai/status/2029586519389086198
- Liquid AI model release page: https://www.liquid.ai/blog/lfm2-24b-a2b
- Hugging Face model card: https://huggingface.co/LiquidAI/LFM2-24B-A2B
Source Verification Notes
- X is the primary source for the specific benchmark-style claims in this post (385ms, 67 tools, 13 MCP servers, 14.5GB on M4 Max).
- Supporting technical context (model architecture and scaling framing) was cross-referenced against Liquid AI’s official release page and model card.
- Deduplication check completed against
src/content/blog: no existing post centered on Liquid AI LFM2-24B-A2B or this specific on-device MCP tool-calling result.
