Brewing...

Industry Insights

Google Hosts Siri, Drops Gemini Flash-Lite, and DeepSeek V4 Arrives on Chinese Silicon

Three infrastructure decisions landed on the same Tuesday: Apple cedes AI to Google's cloud at ~$1B/year, Google ships its most cost-efficient frontier model yet, and DeepSeek V4 drops optimized exclusively on Chinese silicon — Nvidia nowhere in the stack.

Sean McLellan

Lead Architect & Founder

March 3, 20264 min read

Three things happened today that don't usually happen on the same day: Apple confirmed it is routing its AI ambitions through a competitor's cloud, Google shipped the cheapest capable Gemini model in the product's history, and DeepSeek V4 appeared to land — optimized exclusively on Chinese silicon, with Nvidia and AMD explicitly excluded from the pre-release process. If you're the person at a 20-50 person company who has to answer "what AI stack are we building on," today's news reshuffled the table.

The story under the story: Apple pays Google ~$1B/year to host Siri

This is the one that will age.

When Apple announced a Gemini deal in January, it read as a feature integration — Gemini powers some Apple Intelligence responses, Apple keeps the brand. What emerged this week is structurally different: Apple is in discussions to have Google host the next Siri entirely on Google Cloud infrastructure, reportedly at ~$1 billion per year.

The implications are more interesting than the headline number. Apple has spent years and billions building Private Cloud Compute — a privacy-preserving inference infrastructure pitched as the reason you could trust Apple with your personal data. Routing Siri through Google's servers doesn't break that promise technically (the privacy controls can persist), but it does expose what actually happened: Apple couldn't close the capability gap on foundation models fast enough to matter. The A19 chip on the iPhone 17e (announced today, shipping March 11 at $599) runs a 16-core Neural Engine that is genuinely faster than its predecessor for on-device inference. Apple Intelligence features — Live Translation, Call Screening, Visual Intelligence — work locally. But the more ambitious "personalized Siri" that Apple teased at WWDC? That's running on Gemini, and now possibly on Google's racks.

For an IT buyer at a mid-size company: the practical signal is that Google has secured two of the three major mobile AI surfaces (Samsung Galaxy S26 launched last month on Gemini; iPhone follows). If you're evaluating which AI vendor to standardize on for productivity and mobile workflows, Google's infrastructure position just got significantly harder to route around.

Samsung's Gemini deployment is worth watching as a preview. CNBC noted last week that the S26 is functioning as a live showcase for what Gemini will do inside Siri. If the experience holds up, Apple-Google becomes the dominant device-side AI pairing by the end of 2026.

The other four

Google Gemini 3.1 Flash-Lite (preview, today). Available now in Google AI Studio and Vertex AI. It's the first Flash-Lite in the Gemini 3 family — designed for high-volume agentic tasks, classification, routing, and anything where cost and latency matter more than peak capability. Early reporting puts performance close to Gemini 2.5 Flash on coding, math, and multimodal tasks, at lower cost. If you're running batch jobs or building automations, this is the model to benchmark first.

DeepSeek V4 (dropping today or tonight). The Hangzhou lab's first major release in over a year is a multimodal model — text, images, video — and it was optimized exclusively on Huawei and Cambricon hardware. Nvidia and AMD were deliberately excluded from pre-release access. That's an infrastructure bet, not an oversight: it hands Chinese chip vendors a several-week tuning advantage and signals DeepSeek is aligning its roadmap with domestic silicon going forward. Expect the access split to affect inference availability outside China.

Anthropic vs. DeepSeek (and Moonshot and MiniMax): the distillation accusation. Anthropic says three Chinese labs made over 16 million Claude interactions through ~24,000 unauthorized accounts, extracting capabilities to train competing models. Distillation from a frontier model's outputs is technically standard practice — the accusation is about doing it at scale without authorization. Whether this leads to legal action or terms enforcement is unclear, but it's reframing how the industry discusses Chinese lab progress.

Apple iPhone 17e ($599, ships March 11). The A19 chip with 16-core Neural Engine brings full Apple Intelligence — Live Translation, Call Screening, Visual Intelligence — to the budget tier. This is significant for device fleet decisions: organizations that deferred AI-capable hardware can now upgrade at the $599 price point rather than flagship pricing.

This week is shaping up as a consolidation moment: Google secures the device layer, DeepSeek locks in a Chinese-silicon stack, and the cost floor on inference drops again. The operator who waits another quarter to pick a primary AI vendor is watching the options narrow.

AI Pilot Readiness Checklist

Turn the idea into a pilot you can defend.

AI agent articles are easy to bookmark and hard to operationalize. Use the readiness questions as a shared way to decide whether a workflow is specific enough, safe enough, and measurable enough to pilot. If they surface a strong candidate, BaristaLabs can review it with you and help shape a first version that fits your systems, approval process, and risk tolerance.

Turn this into a pilot plan Talk through a pilot candidate with BaristaLabs

Please do not submit PHI, customer records, credentials, or confidential workflow exports.

Practical AI Workflow Notes

Want more practical AI operations ideas?

Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.

Share this post

Share on X Share on LinkedIn Share on Bluesky

March 2 AI Dispatch: DeepSeek Goes Multimodal, Nvidia Bets on Photons, Apple Cuts the AI Price Floor

March 2, 2026

OpenAI Publishes Its Pentagon Red Lines, Apple Retires Core ML, and DeepSeek V4 Ships This Week

March 1, 2026

Anthropic Called OpenAI a Liar. Your AI Vendor Stack Just Got Political.

March 4, 2026

Keep Reading

March 2 AI Dispatch: DeepSeek Goes Multimodal, Nvidia Bets on Photons, Apple Cuts the AI Price Floor

March 2, 2026

OpenAI Publishes Its Pentagon Red Lines, Apple Retires Core ML, and DeepSeek V4 Ships This Week

March 1, 2026

Anthropic Called OpenAI a Liar. Your AI Vendor Stack Just Got Political.

March 4, 2026

Industry Insights

Google Hosts Siri, Drops Gemini Flash-Lite, and DeepSeek V4 Arrives on Chinese Silicon

Sean McLellan

Lead Architect & Founder

March 3, 20264 min read

The story under the story: Apple pays Google ~$1B/year to host Siri

This is the one that will age.

The other four

AI Pilot Readiness Checklist

Turn the idea into a pilot you can defend.

Turn this into a pilot plan Talk through a pilot candidate with BaristaLabs

Please do not submit PHI, customer records, credentials, or confidential workflow exports.

Practical AI Workflow Notes

Want more practical AI operations ideas?

Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.

Share this post

Share on X Share on LinkedIn Share on Bluesky

March 2 AI Dispatch: DeepSeek Goes Multimodal, Nvidia Bets on Photons, Apple Cuts the AI Price Floor

March 2, 2026

OpenAI Publishes Its Pentagon Red Lines, Apple Retires Core ML, and DeepSeek V4 Ships This Week

March 1, 2026

Anthropic Called OpenAI a Liar. Your AI Vendor Stack Just Got Political.

March 4, 2026