The first developer's setup takes about twenty minutes.
Set CLAUDE_CODE_USE_VERTEX=1, point it at a Google Cloud project, grant the developer's own credentials roles/aiplatform.user, and Claude Code starts sending inference through Vertex AI instead of the public API. Nothing leaves the company's cloud perimeter. It works the first time. Nobody files a ticket.
The tenth developer's setup takes the same twenty minutes. It creates a different problem entirely.
Now there are ten sets of cloud credentials living on ten laptops, or one shared service account nobody quite owns. Someone pushed a managed-settings.json file through MDM, and there's a quiet argument about whether it actually reaches every machine. Nobody can say with confidence who ran what, because usage shows up under whatever OTEL_RESOURCE_ATTRIBUTES each client happened to set. That's just whatever the developer's laptop was configured to claim. There's no spend ceiling, so the first runaway loop turns into a Slack message to finance instead of an automatic stop. And when someone leaves the team, revoking their access means finding every place their credentials were used and hoping the list is complete.
That's not a Claude Code bug - it's what happens to any per-developer cloud credential the moment a solo workflow becomes a team workflow. On July 1, Anthropic and Google Cloud shipped the missing piece for this exact case: a self-hosted service that sits between Claude Code clients and Google Cloud, turning "every developer configures their own access" into "one gateway configures access for everyone."
What actually shipped
Google Cloud announced the Claude apps gateway on July 1, in a joint post from Anthropic applied AI engineer Roy Arsan and Google Cloud AI engineer Ivan Nardini. Their framing is blunt about what the direct-to-Vertex path was missing at scale: per-developer cloud credentials, MDM-pushed settings files with no server-side enforcement, weak attribution, and no real spend cap.
"The Claude apps gateway closes that gap," they write. "It is a self-hosted service, shipped with the same claude binary, that sits directly between your local Claude Code clients and Google Cloud." The goal, in their words, is to "centralize the governance that developers and platform admins otherwise each carry alone such as identity, policy, cost, and routing."
Read past the announcement, and the thing gets a lot more concrete than "governance."
Identity moves off the laptop first. Instead of a developer holding cloud credentials directly, /login routes through Google Workspace or any OIDC-compatible identity provider, and the gateway swaps that login for a short-lived session. No service-account keys, API keys, or ANTHROPIC_VERTEX_PROJECT_ID values ever land on the developer's machine; the gateway holds those, not the client. Offboarding stops being an audit exercise and becomes removing one person from one IdP group.
Policy moves to the server. RBAC rules sit in a gateway.yaml file the platform team owns, and the availableModels a given user or group can reach gets re-checked on every single /v1/messages call, not cached at login and not trusted from a config file a developer could edit locally.
Telemetry gets a real identity attached to it, and this is the detail worth pausing on. The claude_code.token.usage metric now carries a verified email and group membership pulled from the signed session JWT the gateway issued, not from OTEL_RESOURCE_ATTRIBUTES a client sets on its own. A self-reported attribute is a label. A claim inside a signed session token is something you can actually audit.
Spend gets a ceiling. Admins can set daily, weekly, or monthly caps per user, per group, or org-wide, and the gateway meters usage against a running ledger in Cloud SQL and returns an HTTP 429 once a cap is hit. One caveat deserves emphasis: costs are tracked at list price. That makes the cap a guardrail against a runaway loop burning a month's budget in an afternoon, not a substitute for reconciling an actual cloud bill.
Routing stays inside the perimeter but goes through one door. Calls leave under a single Cloud Run service identity, with the option of a global endpoint or ordered upstream failover, and inference stays inside the GCP project. The gateway itself is stateless on Cloud Run. Cloud SQL holds device-code sign-in state and the spend ledger. An OTLP collector receives the attributed metrics on the way out.

What it costs to stand up
Standing up the gateway is a real build, not a checkbox in a console: enabling Vertex AI Agent Platform, Cloud SQL, and Secret Manager; creating a claude-gateway service account scoped to roles/aiplatform.user; provisioning a Cloud SQL Postgres instance; registering an OAuth web application client with your identity provider; and storing gateway.yaml, the OIDC secret, the Postgres connection string, and a JWT signing key in Secret Manager.
That's a real infrastructure project, not a config flag, and Anthropic's own documentation is upfront about what it is and isn't. The deployment guide describes the reference setup, built on Cloud Run or GKE, Cloud SQL for PostgreSQL, and Secret Manager, as "a working example for customer-managed infrastructure rather than a supported production deployment; use it to see how the pieces fit together before adapting it to your own environment." Google and Anthropic hand you a blueprint. Your platform team runs it, patches it, and answers for it when it goes down.
That framing carries a second warning worth taking seriously. The deployment and operations docs note that Claude Code will only connect to a gateway whose address is private, precisely because a trusted gateway can push settings to a developer's machine that run commands there. A gateway that centralizes identity and policy is also a gateway that, if exposed or compromised, has real reach into every connected developer's environment. Standing it up on a public endpoint isn't a shortcut. It's a different, larger risk than the one you were trying to close.
The same tradeoff shows up wherever a control layer earns this much trust - we saw it with AWS AgentCore Gateway's tool-call interceptors too: the more a layer enforces policy, the more it becomes the thing worth securing first.
Write the sheet before you write the config
gateway.yaml answers "how is this configured." It doesn't answer "who's accountable when it breaks, and what do we do when someone leaves the team." Those questions belong on a shorter document: one page, plain language, something a platform lead can hand to a new engineer or an auditor without walking them through YAML.
Before Claude Code spreads past the first two or three developers, fill in a gateway control sheet with these fields:
- Owner — the specific person or team accountable for the gateway, not "platform team" as an abstraction.
- Identity source — which IdP or Workspace domain issues logins, and who administers group membership.
- Allowed groups and models — which IdP groups map to which entries in
availableModels, and who approves changes to that mapping. - Telemetry destination — where
claude_code.token.usageand related metrics land, and who reviews them on a recurring basis. - Spend cap behavior — the daily, weekly, or monthly ceiling per user, group, and org, plus what a developer actually sees when they hit a 429.
- Routing and failover rule — which upstream is primary, what triggers failover, and whether that failover ever leaves the GCP project boundary.
- Private ingress rule — confirmation that the gateway address is not publicly reachable, and who's responsible for verifying that stays true after the next network change.
- Offboarding test — a specific, repeatable check: remove a test account from the IdP group, confirm Claude Code access dies within a defined window.
- Support boundary — what your team debugs versus what's explicitly out of scope because the deployment is customer-managed, not vendor-operated.
- Incident stop rule — who has the authority to pull the gateway's credentials or drop the Cloud Run service if something looks wrong, and how fast that decision has to happen.
This isn't a new genre of paperwork. It's the same instinct behind a scope ladder for coding agents, a merge receipt for agent-written pull requests, or the broader habit of writing workflow controls down before an autonomous tool gets wider access: name the boundary while it's still easy to change, not after the second developer asks a question nobody can answer.
The honest state of this
This is a young feature. Anthropic and Google Cloud shipped it four days ago, and beyond the announcement and the docs, the broader conversation about it hasn't really formed yet. Better to say so than dress it up as a bigger trend than it is.
What's already clear is the shape of the tension. Developers reasonably want Claude Code to feel like a tool they configure once and use freely. Platform and security teams reasonably need to know who's using it, under what identity, against what budget, and how fast they can cut someone off. The gateway is Anthropic and Google Cloud's answer: a self-hosted control point your team runs, not a managed service that runs it for you.
If your team is past the "one developer, one Vertex project" stage, that's the moment to write the sheet, not after the first surprise invoice or the first departing engineer whose access nobody remembered to check.
Our process automation work starts in exactly this spot: mapping identity, policy, and spend boundaries for a tool before it spreads across a team, so the write-up exists before the incident does.
Get the AI coding gateway control sheet
Get the AI coding gateway control sheet
A one-page worksheet for owner, identity source, allowed models, telemetry destination, spend caps, routing, private ingress, offboarding, support boundary, and incident stop rule.
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
Turn this idea into a pilot
Which workflow should go first?
Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.
- 3-5 minutes
- Deterministic score
- No sensitive data
Share this post
