TLDR

  • GitHub’s June 1 billing change signals the end of casual AI usage. Companies must move from “enable Copilot and hope” to structured AI engineering practice.
  • The real problem is not expensive models — it is unstructured usage. Premium models are justified for complex work, but using them for boilerplate, documentation, or simple Q&A is waste.
  • Context windows and prompt discipline matter more than model choice. Vague prompts, unpruned conversations, and unbounded agent sessions drive cost without value.
  • Agent mode needs governance: Require plans before execution, bound task scope, and developer checkpoints. Agentic workflows can burn tokens quickly.
  • Why companies are underprepared: Most organisations have licences but lack enablement. Without training on model selection, context management, and cost awareness, developers default to premium models for everything.

On 1 June 2026, GitHub Copilot moved from a premium-request model to usage-based billing through GitHub AI Credits. That sounds like a billing change, but I think it is actually a much bigger signal for engineering organisations.

It tells us that the era of casual, unstructured AI usage inside developer tools is ending.

For the last couple of years, many organisations have treated AI coding assistants as a productivity add-on: enable Copilot, give developers access to the best models available, and assume productivity will follow. That may have worked when usage was simpler and cost exposure was less visible. But GitHub Copilot is no longer just an autocomplete tool. It now includes chat, agent mode, code review, CLI-based workflows, model selection, custom context, larger context windows, and increasingly autonomous coding sessions.

That changes the governance problem.

The question is no longer:

“Should developers use AI?”

The better question is:

“How do we train developers to use the right model, with the right context, for the right task, at the right cost?”

That is where many companies are currently underprepared.


About this series

This is Part 1 of a 2-part series on AI operating models:

  • Part 1 (this post): The problems and challenges — what governance gaps exist, why token costs are spiralling, and what risks come with unstructured AI usage.
  • Part 2: Building an AI Operating Model — The practical framework — how to implement model routing, build enablement programmes, govern agent mode, and measure ROI.

This post focuses on the diagnosis. The next post provides the cure. Read Part 2 →


What changed on 1 June?

GitHub announced that from 1 June 2026, Copilot plans would transition to usage-based billing. Instead of counting only premium requests, Copilot now consumes GitHub AI Credits based on token usage. That includes input tokens, output tokens, and cached tokens, with cost varying by model. GitHub’s own explanation is clear: the cost of an interaction depends on both the model used and the number of tokens consumed.

The headline items are:

AreaWhat changedWhy it matters
Billing unitPremium Request Units were replaced by GitHub AI CreditsCost is now more directly linked to actual model usage
Token accountingInput, output, and cached tokens are countedLong prompts, long responses, and large context can materially affect cost
Model choiceDifferent models consume credits at different ratesA premium model used casually can become expensive quickly
Agentic workflowsLong-running agent sessions can consume significantly moreAutonomous multi-step work is not equivalent to a quick chat question
Enterprise controlsBudgets can be managed at enterprise, cost centre, and user levelsGovernance is now a first-class requirement
Pooled usageBusiness usage can be pooled across the organisationOrganisations need usage analytics, not just seat counts
Code reviewCopilot code review can consume AI Credits and GitHub Actions minutesAI use can now cross into adjacent platform cost lines

GitHub has also stated that larger context windows and higher reasoning levels consume more AI Credits per interaction. In June, GitHub announced support for one-million-token context windows and configurable reasoning levels in Copilot, while explicitly recommending that users keep the default context and reasoning level for everyday tasks and reserve extended context or higher reasoning for complex, multi-file problems.

That recommendation is important. It is effectively saying: capability has increased, but so has the need for discipline.


The real problem is not expensive models

There is a common argument that appears whenever AI tooling costs increase:

“We need to use cheaper models.”

That is too simplistic.

A colleague of mine provided an equally common counterargument:

“Cheaper models produce weaker results, and if we need twice as many prompts, we have not saved anything.”

That is also valid, but does not address the core issue.

The real issue is not expensive models. The real issue is unstructured model usage.

A premium model is absolutely justified when the task demands it. If someone is performing complex architecture analysis, debugging a difficult production issue, reviewing a security-sensitive change, or refactoring across a large codebase, then using a stronger model may be the most economical choice.

But that does not mean the same model should be used for every activity in the development lifecycle.

Using the strongest available model for every prompt is like using a senior architect to format YAML, write boilerplate comments, summarise a README, or explain a simple compiler error. It will work, but it is not a sensible operating model.

The goal should not be to minimise model cost in isolation.

The goal should be:

Lowest total cost to an acceptable outcome.

That includes:

Cost factorWhy it matters
Token costDirect consumption of AI Credits or equivalent platform spend
Developer timePoor model choice can increase re-prompting and review effort
Context sizeUnnecessary files, logs, and history increase token usage
Output qualityWeak output can create rework or false confidence
Review effortAI-generated code still needs human validation
Defect riskPoor recommendations can create production, security, or maintainability issues
Delivery speedGood model routing can reduce cycle time without uncontrolled spend
Governance overheadUntracked usage makes forecasting and accountability difficult

This is why the better discussion is not “cheap models vs expensive models.”

The better discussion is model routing.


Context windows are not free memory

One of the most important concepts companies need to teach is the context window.

A context window is the amount of information the model can consider at one time. It includes the developer’s prompt, previous conversation, system instructions, selected files, retrieved repository context, tool outputs, terminal output, and the model’s own responses. In long sessions, this can fill quickly.

Larger context windows are useful, but they are not automatically better. They allow the model to inspect more information, but they can also increase token consumption and cost.

GitHub’s June update is a good example. One-million-token context windows make it possible to work across larger codebases and longer documents. That is powerful for complex multi-file work. But GitHub also states that larger context windows and higher reasoning levels consume more AI Credits, and recommends using default context and reasoning for everyday tasks.

That should become a standard enterprise training point:

Context practiceGood behaviourPoor behaviour
File selectionAttach only the files needed for the taskAttach the whole repository by default
Conversation lengthStart a fresh session for a new taskKeep reusing one long, polluted chat
Terminal outputPaste only relevant errors and logsPaste thousands of lines without filtering
Agent scopeGive a bounded task and stop conditionTell the agent to “fix everything”
Reasoning levelIncrease only for complex problemsUse maximum reasoning for routine edits
Context window sizeExtend for large multi-file analysisUse large context as the default
DocumentationProvide architecture notes and constraintsAssume the model will infer business rules

The context window should be treated like an engineering resource. It needs to be curated.


Prompt discipline matters more than most people think

A large percentage of AI waste comes from poor prompting, not poor models.

Developers often ask vague questions like:

Can you fix this?

or:

Make this better.

That forces the model to infer intent, inspect unnecessary context, make assumptions, and often generate broad changes that require another round of correction.

A better prompt has structure:

Context:
Goal:
Files involved:
Constraints:
Expected output:
Definition of done:
What not to change:

For example:

Context: This is a Python FastAPI service deployed to Azure Container Apps.
Goal: Add structured logging to the upload endpoint.
Files involved: app/api/upload.py, app/core/logging.py.
Constraints: Do not change the API contract. Do not introduce new dependencies.
Expected output: Minimal code changes and a short explanation.
Definition of done: Existing tests pass, new logging is included, and no secrets are logged.
What not to change: Do not modify authentication or request validation logic.

This kind of prompt reduces ambiguity. It also reduces the number of retries. That is good for cost, but more importantly, it is good for engineering quality.


Agent mode needs governance

Agentic workflows are where the economics change most dramatically.

A normal chat request might be one model interaction. An agentic workflow can involve many model calls, file reads, shell commands, searches, edits, test runs, and follow-up reasoning. That is the point of agent mode, but it also means agent mode needs stronger usage discipline.

Organisations should define explicit agent rules:

RulePolicy
Start with a planAgent must propose a plan before modifying files for non-trivial work
Bound the taskThe prompt must define scope, files, and expected outcome
Limit blast radiusAvoid broad instructions like “refactor the project” without decomposition
Require checkpointsDeveloper reviews diffs after each meaningful step
Use tests deliberatelyAgent should run targeted tests before broad test suites
Stop on uncertaintyAgent should ask when requirements are unclear instead of guessing
Use premium models selectivelyPremium models are allowed for complex agentic work, but not every task
Track consumptionTeams should monitor usage by user, repo, workflow, and model

The key point is not to block agentic development. The key point is to make it auditable and repeatable.


Why companies are underprepared

Many organisations have invested in Copilot licences but have not invested enough in operating practices.

That gap is now visible.

A licence gives access. It does not teach:

  • how to choose the right model
  • how to manage context windows
  • how to write scoped prompts
  • how to use agent mode safely
  • how to evaluate AI-generated code
  • how to avoid leaking sensitive data into prompts
  • how to measure productivity against AI spend
  • how to decide when a premium model is justified
  • how to use AI for review rather than just generation

This is where engineering leadership needs to step in.


What happens next

The June 1 billing change is not just a pricing event. It is a maturity test.

Organisations that treat it only as a finance problem will respond with blunt restrictions. Organisations that treat it as an engineering operating model problem will build better habits.

The next post covers the practical framework: how to implement model routing, train developers, govern agent mode, and measure ROI properly.

The question is no longer “Should we use AI?”

The question is: “Are we structured enough to use it well?”


Ready for Part 2?

Continue to Part 2: Building an AI Operating Model →

Part 2 covers the implementation roadmap: model routing tables, enablement programmes, prompt templates, agent governance rules, and how to measure AI ROI properly.


Author’s note

This post was co-written with AI assistance. I used GitHub Copilot to help structure the argument, develop the tables and examples, and refine the prose. The core thesis and governance concerns are my own, but AI was valuable in articulating the problems clearly.