Three things happened today that don't usually happen on the same day: Apple confirmed it is routing its AI ambitions through a competitor's cloud, Google shipped the cheapest capable Gemini model in the product's history, and DeepSeek V4 appeared to land — optimized exclusively on Chinese silicon, with Nvidia and AMD explicitly excluded from the pre-release process. If you're the person at a 20-50 person company who has to answer "what AI stack are we building on," today's news reshuffled the table.
The story under the story: Apple pays Google ~$1B/year to host Siri
This is the one that will age.
When Apple announced a Gemini deal in January, it read as a feature integration — Gemini powers some Apple Intelligence responses, Apple keeps the brand. What emerged this week is structurally different: Apple is in discussions to have Google host the next Siri entirely on Google Cloud infrastructure, reportedly at ~$1 billion per year.
The implications are more interesting than the headline number. Apple has spent years and billions building Private Cloud Compute — a privacy-preserving inference infrastructure pitched as the reason you could trust Apple with your personal data. Routing Siri through Google's servers doesn't break that promise technically (the privacy controls can persist), but it does expose what actually happened: Apple couldn't close the capability gap on foundation models fast enough to matter. The A19 chip on the iPhone 17e (announced today, shipping March 11 at $599) runs a 16-core Neural Engine that is genuinely faster than its predecessor for on-device inference. Apple Intelligence features — Live Translation, Call Screening, Visual Intelligence — work locally. But the more ambitious "personalized Siri" that Apple teased at WWDC? That's running on Gemini, and now possibly on Google's racks.
For an IT buyer at a mid-size company: the practical signal is that Google has secured two of the three major mobile AI surfaces (Samsung Galaxy S26 launched last month on Gemini; iPhone follows). If you're evaluating which AI vendor to standardize on for productivity and mobile workflows, Google's infrastructure position just got significantly harder to route around.
Samsung's Gemini deployment is worth watching as a preview. CNBC noted last week that the S26 is functioning as a live showcase for what Gemini will do inside Siri. If the experience holds up, Apple-Google becomes the dominant device-side AI pairing by the end of 2026.
The other four
Google Gemini 3.1 Flash-Lite (preview, today). Available now in Google AI Studio and Vertex AI. It's the first Flash-Lite in the Gemini 3 family — designed for high-volume agentic tasks, classification, routing, and anything where cost and latency matter more than peak capability. Early reporting puts performance close to Gemini 2.5 Flash on coding, math, and multimodal tasks, at lower cost. If you're running batch jobs or building automations, this is the model to benchmark first.
DeepSeek V4 (dropping today or tonight). The Hangzhou lab's first major release in over a year is a multimodal model — text, images, video — and it was optimized exclusively on Huawei and Cambricon hardware. Nvidia and AMD were deliberately excluded from pre-release access. That's an infrastructure bet, not an oversight: it hands Chinese chip vendors a several-week tuning advantage and signals DeepSeek is aligning its roadmap with domestic silicon going forward. Expect the access split to affect inference availability outside China.
Anthropic vs. DeepSeek (and Moonshot and MiniMax): the distillation accusation. Anthropic says three Chinese labs made over 16 million Claude interactions through ~24,000 unauthorized accounts, extracting capabilities to train competing models. Distillation from a frontier model's outputs is technically standard practice — the accusation is about doing it at scale without authorization. Whether this leads to legal action or terms enforcement is unclear, but it's reframing how the industry discusses Chinese lab progress.
Apple iPhone 17e ($599, ships March 11). The A19 chip with 16-core Neural Engine brings full Apple Intelligence — Live Translation, Call Screening, Visual Intelligence — to the budget tier. This is significant for device fleet decisions: organizations that deferred AI-capable hardware can now upgrade at the $599 price point rather than flagship pricing.
This week is shaping up as a consolidation moment: Google secures the device layer, DeepSeek locks in a Chinese-silicon stack, and the cost floor on inference drops again. The operator who waits another quarter to pick a primary AI vendor is watching the options narrow.
