Brewing...

Industry Insights

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

Top tech CEOs are now saying the same thing in public: AI capacity is tight, and relief may not come until 2028. For small businesses, that means planning for higher inference costs, stricter access, and smarter model choices.

Sean McLellan

Lead Architect & Founder

March 14, 20265 min read

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

March 14, 2026

The strongest signal in AI right now is not a model launch. It is a capacity warning.

Over the last year, major tech CEOs have stopped talking about compute constraints like they are temporary growing pains. They are describing them as a real operating limit.

Sam Altman said OpenAI was "out of GPUs." Oracle CEO Safra Catz said the company was still waving off customers or pushing them into the future. Satya Nadella put it bluntly: in some cases, companies may have chips sitting in inventory without enough powered, ready data center capacity to use them. Sundar Pichai said capacity is the question that keeps Google up at night. And Intel CEO Lip-Bu Tan delivered the clearest summary of all: "There's no relief as far as I know. No relief until 2028."

That matters because AI buyers have gotten used to a different story. For the last year, the dominant narrative was falling inference cost, expanding access, and more intelligence for less money.

That trend may not disappear, but it now has a ceiling. If compute stays tight, the market stops behaving like an endless buffet and starts behaving like a rationed utility.

What inference rationing actually means

For most businesses, inference rationing will not show up as a dramatic press release. It will show up in small operational annoyances that add up fast.

You may see:

higher API pricing on premium models
stricter rate limits during peak demand
slower response times for high-volume workloads
priority tiers that favor bigger enterprise contracts
more pressure to justify which use cases deserve frontier models

In other words, the question shifts from can we use AI here? to which work is worth spending scarce intelligence on?

That is a healthy question, honestly. Too many companies have treated frontier models like cheap universal labor. If capacity stays constrained, that mindset gets expensive.

Why SMBs should care now

Large enterprises will respond to scarcity by buying priority, reserving capacity, or signing bigger commitments. Small and midsize businesses usually do not have that luxury.

If your team depends on low-cost API access for content generation, customer support automation, internal copilots, or workflow agents, you are more exposed than you think. A modest price increase or tighter rate limit may not sound catastrophic, but it can break the economics of a workflow that only worked because tokens were cheap.

This also changes the model mix.

For many tasks, the winning setup may no longer be "send everything to the smartest model available." It may be a stack that uses smaller open-source models for classification, extraction, summarization, and first-pass drafting, while reserving premium closed models for the few moments where accuracy or reasoning really matters.

That is not a downgrade. It is operational discipline.

What SMBs should do before prices tighten

Here are the practical moves worth making now.

1. Audit where you are spending tokens

Most companies still do not know which workflows are actually consuming the most inference.

Map your current AI usage by task, team, and model. Which workflows are mission-critical? Which ones are nice-to-have? Which ones produce measurable ROI, and which ones just feel modern?

If you cannot answer that in one page, you are not ready for a tighter market.

2. Build a model routing strategy

Stop treating model choice as static.

Create a simple routing approach: small model first, premium model only when confidence is low, stakes are high, or the task is unusually complex. That one change can cut cost without wrecking quality. It also makes you less dependent on one provider's pricing or capacity decisions.

3. Keep an open-source fallback ready

You do not need to self-host everything tomorrow. You do need a contingency plan.

Identify one or two workloads that could move to an open-source model if commercial APIs get slower, pricier, or harder to access. Test them now while you have breathing room. Waiting until a provider changes pricing is the worst time to figure out your fallback.

4. Redesign prompts and workflows for efficiency

Scarcity rewards companies that waste less compute.

Tighter prompts, better context selection, caching repeated work, batching requests, and trimming unnecessary back-and-forth all reduce token spend. So does rethinking the job itself. Sometimes the best cost optimization is not a different model. It is asking the model to do less.

5. Reserve frontier models for high-leverage work

Do not spend top-tier inference on every draft, every summary, and every internal convenience.

Use your best models where the upside is real: revenue-producing workflows, customer-facing moments, decision support, and tasks where a better answer changes an outcome. Everything else should earn its way up the stack.

Constraint creates opportunity

This is the part many SMBs miss.

When compute is abundant, sloppy systems can survive. When compute gets tight, disciplined operators pull ahead.

The businesses that win this next phase of AI will not necessarily be the ones with the biggest budget. They will be the ones that know which workloads matter, route work intelligently, and avoid paying frontier-model prices for commodity tasks.

If relief really does not come until 2028, then AI scarcity is not a short-term inconvenience. It is a planning assumption.

The smart move is to act like rationing is coming before your vendor makes it obvious.

Share this post

Share on X Share on LinkedIn Share on Bluesky

The MCP token tax no one quoted: 44,000 tokens to check one repo language

March 16, 2026

Facebook Just Tightened the Rules on AI Slop. SMBs Need to Adjust Fast.

March 14, 2026

AI Bot Spam Just Killed Digg's Comeback — Here's What That Means for Your Marketing

March 13, 2026

Keep Reading

The MCP token tax no one quoted: 44,000 tokens to check one repo language

March 16, 2026

Facebook Just Tightened the Rules on AI Slop. SMBs Need to Adjust Fast.

March 14, 2026

AI Bot Spam Just Killed Digg's Comeback — Here's What That Means for Your Marketing

March 13, 2026

Industry Insights

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

Sean McLellan

Lead Architect & Founder

March 14, 20265 min read

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

March 14, 2026

The strongest signal in AI right now is not a model launch. It is a capacity warning.

Over the last year, major tech CEOs have stopped talking about compute constraints like they are temporary growing pains. They are describing them as a real operating limit.

That matters because AI buyers have gotten used to a different story. For the last year, the dominant narrative was falling inference cost, expanding access, and more intelligence for less money.

That trend may not disappear, but it now has a ceiling. If compute stays tight, the market stops behaving like an endless buffet and starts behaving like a rationed utility.

What inference rationing actually means

For most businesses, inference rationing will not show up as a dramatic press release. It will show up in small operational annoyances that add up fast.

You may see:

higher API pricing on premium models
stricter rate limits during peak demand
slower response times for high-volume workloads
priority tiers that favor bigger enterprise contracts
more pressure to justify which use cases deserve frontier models

In other words, the question shifts from can we use AI here? to which work is worth spending scarce intelligence on?

That is a healthy question, honestly. Too many companies have treated frontier models like cheap universal labor. If capacity stays constrained, that mindset gets expensive.

Why SMBs should care now

Large enterprises will respond to scarcity by buying priority, reserving capacity, or signing bigger commitments. Small and midsize businesses usually do not have that luxury.

This also changes the model mix.

That is not a downgrade. It is operational discipline.

What SMBs should do before prices tighten

Here are the practical moves worth making now.

1. Audit where you are spending tokens

Most companies still do not know which workflows are actually consuming the most inference.

Map your current AI usage by task, team, and model. Which workflows are mission-critical? Which ones are nice-to-have? Which ones produce measurable ROI, and which ones just feel modern?

If you cannot answer that in one page, you are not ready for a tighter market.

If relief really does not come until 2028, then AI scarcity is not a short-term inconvenience. It is a planning assumption.

The smart move is to act like rationing is coming before your vendor makes it obvious.

Share this post

Share on X Share on LinkedIn Share on Bluesky

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

What inference rationing actually means

Why SMBs should care now

What SMBs should do before prices tighten

1. Audit where you are spending tokens

2. Build a model routing strategy

3. Keep an open-source fallback ready

4. Redesign prompts and workflows for efficiency

5. Reserve frontier models for high-leverage work

Constraint creates opportunity

Share this post

Related Posts

The MCP token tax no one quoted: 44,000 tokens to check one repo language

Facebook Just Tightened the Rules on AI Slop. SMBs Need to Adjust Fast.

AI Bot Spam Just Killed Digg's Comeback — Here's What That Means for Your Marketing

Keep Reading

The MCP token tax no one quoted: 44,000 tokens to check one repo language

Facebook Just Tightened the Rules on AI Slop. SMBs Need to Adjust Fast.

AI Bot Spam Just Killed Digg's Comeback — Here's What That Means for Your Marketing

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

AI Compute Is Officially Scarce. SMBs Should Plan for Rationing Now.

What inference rationing actually means

Why SMBs should care now

What SMBs should do before prices tighten

1. Audit where you are spending tokens

2. Build a model routing strategy

3. Keep an open-source fallback ready

4. Redesign prompts and workflows for efficiency

5. Reserve frontier models for high-leverage work

Constraint creates opportunity

Share this post

Related Posts

The MCP token tax no one quoted: 44,000 tokens to check one repo language

Facebook Just Tightened the Rules on AI Slop. SMBs Need to Adjust Fast.

AI Bot Spam Just Killed Digg's Comeback — Here's What That Means for Your Marketing

Keep Reading

The MCP token tax no one quoted: 44,000 tokens to check one repo language

Facebook Just Tightened the Rules on AI Slop. SMBs Need to Adjust Fast.

AI Bot Spam Just Killed Digg's Comeback — Here's What That Means for Your Marketing