Anthropic just removed one of the most persistent cost headaches in AI development.
As of today, the 1 million token context window is generally available for both Claude Opus 4.6 and Claude Sonnet 4.6. The announcement came directly from the official @claudeai account: "1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6."
The bigger news, confirmed separately, is that Anthropic no longer charges extra for longer context windows. Previously, using the full context capacity meant paying a premium on top of standard token rates. That surcharge is gone. You pay the same per-token price whether you send 10,000 tokens or 900,000.
For small businesses building on the API, this is not a minor pricing tweak. It changes what kinds of workflows are economically viable.
Why the pricing change matters more than the feature
The 1M context window itself is not new. Claude Opus 4.6 launched with it in February. But general availability at no extra cost is a different thing entirely.
Until now, many SMBs treated the extended context window as a luxury. You could technically send a 500-page contract to Claude, but the premium pricing meant you had to think carefully about whether it was worth the cost versus a chunking workaround. That calculation pushed teams toward building complex retrieval pipelines—splitting documents, embedding chunks in vector databases, and stitching together partial answers.
Those pipelines work. They also take weeks to build, require ongoing maintenance, and introduce their own failure modes. Every chunk boundary is a place where context can be lost.
With standard pricing across the full context window, the math flips. For many workloads, it is now cheaper to just send the whole document than to build and maintain the infrastructure to avoid doing so.
What 1 million tokens actually looks like in practice
A million tokens is roughly 750,000 words. To put that in business terms:
- An entire mid-sized codebase. Most repositories under 50,000 lines of code fit comfortably. That means a refactoring agent can see every file, every dependency, every test—without you having to decide which files to include.
- Five years of contracts. A typical commercial contract runs 5,000 to 15,000 words. You can fit dozens of them in a single prompt and ask Claude to identify conflicting terms, missing clauses, or renewal deadlines across the entire set.
- A full email thread history. Customer support teams can load months of correspondence with a client and get a complete summary without truncation artifacts.
- An entire product catalog. E-commerce businesses with hundreds of SKUs can pass their full catalog for consistency checks, description rewrites, or competitive analysis.
Three workflows that just got simpler
1. Legal document review without a pipeline
A five-person law firm or an SMB with in-house counsel can now upload an entire lease portfolio, vendor agreement stack, or regulatory filing set and ask a single question: "Which of these agreements have auto-renewal clauses that trigger in the next 90 days?" No chunking. No vector database. No retrieval logic to debug.
2. Codebase-wide refactoring agents
Development teams building with Claude Code or the API can now point an agent at a full repository. Instead of carefully selecting which files to include in context, the agent sees everything. That means it can trace a function call from the frontend through the API layer to the database query and back, catching issues that only appear when you see the whole picture.
3. Full-history customer analysis
Sales and support teams can load a customer's complete interaction history—emails, tickets, chat logs, meeting notes—and ask for a relationship summary before a renewal conversation. The difference between a summary built from the last ten interactions and one built from the last two hundred is the difference between guessing and knowing.
What this does not solve
Large context windows are powerful, but they are not magic. A few things to keep in mind:
Accuracy at scale is still imperfect. Research has shown that hallucination rates can increase with context length, particularly when the answer requires synthesizing information from multiple distant sections. For high-stakes decisions—legal compliance, financial reporting—you still want human review on the output.
Cost is per-token, not per-call. The pricing change removes the premium, but you are still paying for every token you send. A 900,000-token prompt costs 90 times more than a 10,000-token prompt at the same per-token rate. Use the full window when it genuinely adds value, not by default.
Latency increases with context size. Larger prompts take longer to process. For real-time applications like chatbots or interactive tools, you may still want to keep context lean for responsiveness.
The bottom line for SMBs
This is one of those changes that matters most to the people building things. If you are an SMB developer, a solo founder with an AI-powered product, or a small team running agents against business data, the removal of context window surcharges lowers your cost floor and simplifies your architecture.
The practical advice is straightforward: if you have been maintaining a chunking pipeline specifically to avoid long-context costs, benchmark the alternative. Send the full document. Compare the output quality and the total cost. For many workloads, the pipeline was always a workaround for a pricing constraint that no longer exists.
Anthropic just made the simplest approach the cheapest one too.
