AI Development

Copilot cohorts make AI adoption a management problem, not a seat-count problem

GitHub's new Copilot cohort metrics give leaders a better way to ask whether AI is changing delivery work, not just whether licenses are enabled.

Sean McLellan

Lead Architect & Founder

June 1, 20268 min read

The Monday meeting starts with a slide that says Copilot is enabled for 420 developers.

Someone asks the question everyone knew was coming: "Is it changing delivery?"

The dashboard has license counts. It has active users. It has a few quotes from developers who like autocomplete for boilerplate. It may even have a chart moving up and to the right.

But the VP of engineering still cannot answer the question.

Did teams ship faster, or did they just generate more code? Are developers using Copilot as a slightly better autocomplete, or are they handing work to agents that open pull requests, review code, and operate across the GitHub workflow? Did risk move from writing code to reviewing code? Do approval gates need to change before access expands?

That is the gap GitHub's new Copilot cohort metrics start to expose.

On May 29, GitHub announced that the Copilot usage metrics API now adds cohorts for AI adoption. The change does not prove productivity by itself. It does something more useful: it gives leaders a way to separate "people have the tool" from "people are using AI differently inside the software delivery workflow."

What changed in the Copilot metrics API

GitHub now classifies each engaged user into an AI adoption phase based on Copilot product usage over a rolling 28-day window.

The new user-level field is ai_adoption_phase. Enterprise and organization reports also get a totals_by_ai_adoption_phase array, which groups engagement and activity metrics by phase.

The phase criteria require a surface to be used on at least two days in the last 28 days. That matters. A one-off experiment should not be treated the same as a working habit.

GitHub defines the first version of the phases this way:

Phase 0: no cohort, because the user did not meet engagement criteria for any phase.
Phase 1: code first, which includes code completion and/or IDE agent mode.
Phase 2: agent first, meaning use of a single GitHub-based agent surface such as Copilot cloud agent, Copilot code review, or Copilot CLI.
Phase 3: multi-agent, meaning use of two or more GitHub-based agent surfaces or the new GitHub Copilot app.

Each ai_adoption_phase includes a version field starting at v1, so GitHub can evolve the logic later.

At the enterprise and organization level, GitHub says the grouped reports include metrics such as total engaged users, user-initiated interaction averages, code generation and acceptance averages, lines added and deleted averages, pull requests created, merged, and reviewed averages, plus median time-to-merge average.

One important detail should be printed directly on the dashboard: the aggregated metrics are averages per user within each phase, not sums. If Phase 3 shows more pull request activity, that does not automatically mean the whole organization shipped more. It means the average engaged user in that cohort behaved differently on the measured dimensions.

GitHub's REST documentation also describes Copilot metrics endpoints for organizations and teams, and its rollout guidance positions usage metrics as part of analyzing Copilot adoption over time. The cohort fields add a missing layer: not just whether Copilot is used, but what kind of usage pattern is emerging.

Seat counts were always too weak

Seat counts are procurement metrics. They tell you what you bought.

Active-user metrics are a little better. They tell you whether people touched the tool.

Neither tells you whether work changed.

A developer who accepts autocomplete suggestions for test scaffolding is in a different adoption state than a developer who asks an agent to work through an issue, generate a pull request, respond to code review, and operate through the CLI. Both may count as "active." The management questions are different.

For completion-heavy use, the review burden may stay close to ordinary code review. The developer is still in the loop at the line level. The quality question is whether suggestions are correct, maintainable, and aligned with local patterns.

For agent-heavy use, the work shifts. The developer may be reviewing a larger unit of output. The important evidence is no longer just accepted lines. It includes tool calls, pull request history, approvals, test results, reviewer comments, policy exceptions, and the handoff between human and agent.

That is why we have been watching Copilot move toward managed delivery sessions, not just editor assistance. In our earlier post on the GitHub Copilot app and agent delivery sessions, the important shift was not "more AI." It was that AI work was becoming a managed unit of delivery.

The new cohorts give leaders a way to see whether that shift is actually happening inside their teams.

How to read the cohorts without fooling yourself

Phase 1 is not bad. In many teams, it may be the right steady state for a long time.

If a team is using Copilot mainly for completions and IDE assistance, the management question is not "why are they immature?" It is whether the tool is saving time on the right work without increasing review noise, rework, or hidden defects.

Phase 2 and Phase 3 deserve closer inspection. They may indicate real workflow adoption. They may also indicate experimentation by a small group of power users.

The point is to connect the cohort view to delivery evidence.

If Phase 2 users create more pull requests, look at what happened to review quality. Did reviewers spend more time correcting basic mistakes? Did test failures increase? Did pull requests get smaller and easier to review, or larger and harder to trust?

If Phase 3 users show shorter median time to merge, ask what changed. Did agents remove waiting time, or did teams merge faster by pushing more responsibility onto reviewers? Did incidents or rework show up later?

If code generation and acceptance averages are up, do not treat that as value by itself. More accepted code can be useful. It can also mean more code to own.

A good adoption review should pair cohort metrics with receipts: pull requests, tests, approvals, tool activity, state changes, and exceptions. We made that argument in Agent evals should test workflow receipts, because agent quality is not visible from the final diff alone.

The same rule applies here. Cohorts tell you where to look. They do not tell you what to believe.

Measurement and control are arriving together

The cohort announcement landed the same week as two other GitHub changes that matter for management.

On May 26, GitHub announced targeted Copilot model rules for organizations. Enterprise owners can allow specific Copilot models for specific organizations instead of relying on one enterprise-wide default. The feature is in public preview for Copilot Business and Enterprise, and default model availability can be set as enabled or optional.

That matters because adoption is not uniform. A platform engineering team, a product team, and a regulated workflow team may need different model access rules. Cohort data can show where behavior is changing. Model rules can help leaders decide where to permit, restrict, or stage that change.

GitHub also announced more controls for Copilot Memory deletion scope and the Copilot CLI. Copilot Memory is in public preview for paid Copilot plans. The update adds improved deletion guidance, a repository-level off switch, CLI commands such as /memory on, /memory off, and /memory show, and clearer capture-time scope.

Repository-level facts can be disabled from repository settings. Existing facts are not deleted by that switch, and user-level preferences are unaffected.

This is the part executives often miss. AI adoption metrics should not live in a productivity dashboard by themselves.

If agents are entering more parts of the workflow, teams need matching controls for model access, memory scope, approvals, and review policy. Otherwise, the dashboard encourages expansion before the operating model can absorb it.

For BaristaLabs, this is where AI governance becomes practical. Responsible AI work is not only a policy document. It is the habit of connecting new capability to review gates, allowed scopes, and evidence. That is the lens behind our Responsible AI work.

A practical 30-day Copilot adoption review loop

Start with one repository or one team.

Do not begin with the whole enterprise. That usually turns the conversation into license utilization, procurement pressure, and executive theater. Pick a team where the work is visible enough to review and important enough to matter.

Baseline the delivery and review metrics before you interpret the Copilot cohorts. Look at pull request volume, median time to merge, review depth, test failure patterns, reverted work, incident links, and rework. Keep developer sentiment in the picture too. If the metrics look better but developers say review got harder, pay attention.

Then group the Copilot metrics by cohort.

For Phase 1 users, ask whether completion-heavy use is improving boring work without increasing cleanup. Are developers using it for tests, scaffolding, refactors, and local patterns? Are reviewers seeing more low-quality suggestions slip through?

For Phase 2 users, inspect the first real agent surfaces. If developers use Copilot code review, Copilot CLI, or cloud agent workflows, gather receipts. Which commands ran? Which comments were accepted? Which recommendations were ignored? Which pull requests needed extra human correction?

For Phase 3 users, treat the workflow as changed until proven otherwise. Multi-agent use can be powerful, but it also changes where responsibility sits. A developer may become more of an editor, reviewer, and operator. The review system has to catch up.

Inspect workflow receipts, not only output volume. Pull requests created, merged, and reviewed are useful signals. They are not enough. Look at approval paths, test evidence, reviewer comments, linked issues, deployment checks, and exception handling.

If production failures occur, turn them into regression tests for the agent workflow. We covered that pattern in AWS Bedrock AgentCore agent test suites: failures should become versioned checks, not anecdotes that disappear after the retro.

Then adjust policy.

Maybe one team keeps model access broad because the receipts look strong. Maybe another team stays in completion-only mode while review practices mature. Maybe repository-level memory should be disabled for sensitive projects. Maybe agent-created pull requests need extra review until the team has enough evidence.

The management loop is simple:

Pick a bounded team or repository.
Establish delivery, review, quality, and sentiment baselines.
Read Copilot cohorts as behavior signals, not maturity labels.
Inspect receipts behind agent-heavy work.
Check for review, rework, incident, and failure patterns.
Adjust model, memory, access, and approval policies.
Decide where to expand, pause, or add a review gate.

That is enough for a first 30 days. It is also much better than celebrating enabled seats.

What a useful AI adoption dashboard should answer

A useful AI adoption dashboard should tell a leadership team three things.

Where did behavior change?

Where did risk move?

Where does the next review gate belong?

The new Copilot cohort fields help with the first question. They show whether engaged users are staying in code-first assistance or moving into agent-first and multi-agent workflows.

They only help with the second and third questions if teams connect them to real delivery evidence.

That is where most organizations will either get value or fool themselves. A dashboard that says "Phase 3 adoption is up" is not a management system. A review that says "Phase 3 adoption is up in this repository, review time fell, test failures did not increase, memory is disabled for sensitive paths, and agent-created pull requests require one additional reviewer for now" is much closer.

If you are trying to turn AI adoption into a working management loop, start small. Pick the workflow. Pull the receipts. Decide the next gate.

BaristaLabs helps teams design those practical adoption loops through process automation and AI workflow work. If you want a second set of eyes on how to measure Copilot adoption without building a vanity dashboard, start a low-pressure conversation.

AI Pilot Readiness Checklist

Turn the idea into a pilot you can defend.

AI agent articles are easy to bookmark and hard to operationalize. Use the readiness questions as a shared way to decide whether a workflow is specific enough, safe enough, and measurable enough to pilot. If they surface a strong candidate, BaristaLabs can review it with you and help shape a first version that fits your systems, approval process, and risk tolerance.

Turn this into a pilot plan Talk through a pilot candidate with BaristaLabs

Please do not submit PHI, customer records, credentials, or confidential workflow exports.

Practical AI Workflow Notes

Want more practical AI operations ideas?

Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.

Share this post

Share on X Share on LinkedIn Share on Bluesky

When a coding agent misfires, the first 48 hours matter

July 7, 2026

Codex is moving AI coding agents into the customer feedback loop

May 30, 2026

GitHub's Copilot app turns coding agents into delivery sessions

May 25, 2026