TLDR
- GitHub’s June 1 billing change signals the end of casual AI usage. Companies must move from “enable Copilot and hope” to structured AI engineering practice.
- The real problem is not expensive models — it is unstructured usage. Premium models are justified for complex work, but using them for boilerplate, documentation, or simple Q&A is waste.
- Context windows and prompt discipline matter more than model choice. Vague prompts, unpruned conversations, and unbounded agent sessions drive cost without value.
- Agent mode needs governance: Require plans before execution, bound task scope, and developer checkpoints. Agentic workflows can burn tokens quickly.
- Why companies are underprepared: Most organisations have licences but lack enablement. Without training on model selection, context management, and cost awareness, developers default to premium models for everything.
On 1 June 2026, GitHub Copilot moved from a premium-request model to usage-based billing through GitHub AI Credits. That sounds like a billing change, but I think it is actually a much bigger signal for engineering organisations.
It tells us that the era of casual, unstructured AI usage inside developer tools is ending.
For the last couple of years, many organisations have treated AI coding assistants as a productivity add-on: enable Copilot, give developers access to the best models available, and assume productivity will follow. That may have worked when usage was simpler and cost exposure was less visible. But GitHub Copilot is no longer just an autocomplete tool. It now includes chat, agent mode, code review, CLI-based workflows, model selection, custom context, larger context windows, and increasingly autonomous coding sessions.
That changes the governance problem.
The question is no longer:
“Should developers use AI?”
The better question is:
“How do we train developers to use the right model, with the right context, for the right task, at the right cost?”
That is where many companies are currently underprepared.
About this series
This is Part 1 of a 2-part series on AI operating models:
- Part 1 (this post): The problems and challenges — what governance gaps exist, why token costs are spiralling, and what risks come with unstructured AI usage.
- Part 2: Building an AI Operating Model — The practical framework — how to implement model routing, build enablement programmes, govern agent mode, and measure ROI.
This post focuses on the diagnosis. The next post provides the cure. Read Part 2 →
What changed on 1 June?
GitHub announced that from 1 June 2026, Copilot plans would transition to usage-based billing. Instead of counting only premium requests, Copilot now consumes GitHub AI Credits based on token usage. That includes input tokens, output tokens, and cached tokens, with cost varying by model. GitHub’s own explanation is clear: the cost of an interaction depends on both the model used and the number of tokens consumed.
The headline items are:
| Area | What changed | Why it matters |
|---|---|---|
| Billing unit | Premium Request Units were replaced by GitHub AI Credits | Cost is now more directly linked to actual model usage |
| Token accounting | Input, output, and cached tokens are counted | Long prompts, long responses, and large context can materially affect cost |
| Model choice | Different models consume credits at different rates | A premium model used casually can become expensive quickly |
| Agentic workflows | Long-running agent sessions can consume significantly more | Autonomous multi-step work is not equivalent to a quick chat question |
| Enterprise controls | Budgets can be managed at enterprise, cost centre, and user levels | Governance is now a first-class requirement |
| Pooled usage | Business usage can be pooled across the organisation | Organisations need usage analytics, not just seat counts |
| Code review | Copilot code review can consume AI Credits and GitHub Actions minutes | AI use can now cross into adjacent platform cost lines |
GitHub has also stated that larger context windows and higher reasoning levels consume more AI Credits per interaction. In June, GitHub announced support for one-million-token context windows and configurable reasoning levels in Copilot, while explicitly recommending that users keep the default context and reasoning level for everyday tasks and reserve extended context or higher reasoning for complex, multi-file problems.
That recommendation is important. It is effectively saying: capability has increased, but so has the need for discipline.
The real problem is not expensive models
There is a common argument that appears whenever AI tooling costs increase:
“We need to use cheaper models.”
That is too simplistic.
A colleague of mine provided an equally common counterargument:
“Cheaper models produce weaker results, and if we need twice as many prompts, we have not saved anything.”
That is also valid, but does not address the core issue.
The real issue is not expensive models. The real issue is unstructured model usage.
A premium model is absolutely justified when the task demands it. If someone is performing complex architecture analysis, debugging a difficult production issue, reviewing a security-sensitive change, or refactoring across a large codebase, then using a stronger model may be the most economical choice.
But that does not mean the same model should be used for every activity in the development lifecycle.
Using the strongest available model for every prompt is like using a senior architect to format YAML, write boilerplate comments, summarise a README, or explain a simple compiler error. It will work, but it is not a sensible operating model.
The goal should not be to minimise model cost in isolation.
The goal should be:
Lowest total cost to an acceptable outcome.
That includes:
| Cost factor | Why it matters |
|---|---|
| Token cost | Direct consumption of AI Credits or equivalent platform spend |
| Developer time | Poor model choice can increase re-prompting and review effort |
| Context size | Unnecessary files, logs, and history increase token usage |
| Output quality | Weak output can create rework or false confidence |
| Review effort | AI-generated code still needs human validation |
| Defect risk | Poor recommendations can create production, security, or maintainability issues |
| Delivery speed | Good model routing can reduce cycle time without uncontrolled spend |
| Governance overhead | Untracked usage makes forecasting and accountability difficult |
This is why the better discussion is not “cheap models vs expensive models.”
The better discussion is model routing.
Context windows are not free memory
One of the most important concepts companies need to teach is the context window.
A context window is the amount of information the model can consider at one time. It includes the developer’s prompt, previous conversation, system instructions, selected files, retrieved repository context, tool outputs, terminal output, and the model’s own responses. In long sessions, this can fill quickly.
Larger context windows are useful, but they are not automatically better. They allow the model to inspect more information, but they can also increase token consumption and cost.
GitHub’s June update is a good example. One-million-token context windows make it possible to work across larger codebases and longer documents. That is powerful for complex multi-file work. But GitHub also states that larger context windows and higher reasoning levels consume more AI Credits, and recommends using default context and reasoning for everyday tasks.
That should become a standard enterprise training point:
| Context practice | Good behaviour | Poor behaviour |
|---|---|---|
| File selection | Attach only the files needed for the task | Attach the whole repository by default |
| Conversation length | Start a fresh session for a new task | Keep reusing one long, polluted chat |
| Terminal output | Paste only relevant errors and logs | Paste thousands of lines without filtering |
| Agent scope | Give a bounded task and stop condition | Tell the agent to “fix everything” |
| Reasoning level | Increase only for complex problems | Use maximum reasoning for routine edits |
| Context window size | Extend for large multi-file analysis | Use large context as the default |
| Documentation | Provide architecture notes and constraints | Assume the model will infer business rules |
The context window should be treated like an engineering resource. It needs to be curated.
Prompt discipline matters more than most people think
A large percentage of AI waste comes from poor prompting, not poor models.
Developers often ask vague questions like:
Can you fix this?
or:
Make this better.
That forces the model to infer intent, inspect unnecessary context, make assumptions, and often generate broad changes that require another round of correction.
A better prompt has structure:
Context:
Goal:
Files involved:
Constraints:
Expected output:
Definition of done:
What not to change:
For example:
Context: This is a Python FastAPI service deployed to Azure Container Apps.
Goal: Add structured logging to the upload endpoint.
Files involved: app/api/upload.py, app/core/logging.py.
Constraints: Do not change the API contract. Do not introduce new dependencies.
Expected output: Minimal code changes and a short explanation.
Definition of done: Existing tests pass, new logging is included, and no secrets are logged.
What not to change: Do not modify authentication or request validation logic.
This kind of prompt reduces ambiguity. It also reduces the number of retries. That is good for cost, but more importantly, it is good for engineering quality.
Agent mode needs governance
Agentic workflows are where the economics change most dramatically.
A normal chat request might be one model interaction. An agentic workflow can involve many model calls, file reads, shell commands, searches, edits, test runs, and follow-up reasoning. That is the point of agent mode, but it also means agent mode needs stronger usage discipline.
Organisations should define explicit agent rules:
| Rule | Policy |
|---|---|
| Start with a plan | Agent must propose a plan before modifying files for non-trivial work |
| Bound the task | The prompt must define scope, files, and expected outcome |
| Limit blast radius | Avoid broad instructions like “refactor the project” without decomposition |
| Require checkpoints | Developer reviews diffs after each meaningful step |
| Use tests deliberately | Agent should run targeted tests before broad test suites |
| Stop on uncertainty | Agent should ask when requirements are unclear instead of guessing |
| Use premium models selectively | Premium models are allowed for complex agentic work, but not every task |
| Track consumption | Teams should monitor usage by user, repo, workflow, and model |
The key point is not to block agentic development. The key point is to make it auditable and repeatable.
Why companies are underprepared
Many organisations have invested in Copilot licences but have not invested enough in operating practices.
That gap is now visible.
A licence gives access. It does not teach:
- how to choose the right model
- how to manage context windows
- how to write scoped prompts
- how to use agent mode safely
- how to evaluate AI-generated code
- how to avoid leaking sensitive data into prompts
- how to measure productivity against AI spend
- how to decide when a premium model is justified
- how to use AI for review rather than just generation
This is where engineering leadership needs to step in.
What happens next
The June 1 billing change is not just a pricing event. It is a maturity test.
Organisations that treat it only as a finance problem will respond with blunt restrictions. Organisations that treat it as an engineering operating model problem will build better habits.
The next post covers the practical framework: how to implement model routing, train developers, govern agent mode, and measure ROI properly.
The question is no longer “Should we use AI?”
The question is: “Are we structured enough to use it well?”
Ready for Part 2?
Continue to Part 2: Building an AI Operating Model →
Part 2 covers the implementation roadmap: model routing tables, enablement programmes, prompt templates, agent governance rules, and how to measure AI ROI properly.
Author’s note
This post was co-written with AI assistance. I used GitHub Copilot to help structure the argument, develop the tables and examples, and refine the prose. The core thesis and governance concerns are my own, but AI was valuable in articulating the problems clearly.
