The AI Governance Problem: Why Casual Model Usage No Longer Works

TLDR

GitHub’s June 1 billing change signals the end of casual AI usage. Companies must move from “enable Copilot and hope” to structured AI engineering practice.
The real problem is not expensive models — it is unstructured usage. Premium models are justified for complex work, but using them for boilerplate, documentation, or simple Q&A is waste.
Context windows and prompt discipline matter more than model choice. Vague prompts, unpruned conversations, and unbounded agent sessions drive cost without value.
Agent mode needs governance: Require plans before execution, bound task scope, and developer checkpoints. Agentic workflows can burn tokens quickly.
Why companies are underprepared: Most organisations have licences but lack enablement. Without training on model selection, context management, and cost awareness, developers default to premium models for everything.

On 1 June 2026, GitHub Copilot moved from a premium-request model to usage-based billing through GitHub AI Credits. That sounds like a billing change, but I think it is actually a much bigger signal for engineering organisations.

It tells us that the era of casual, unstructured AI usage inside developer tools is ending.

For the last couple of years, many organisations have treated AI coding assistants as a productivity add-on: enable Copilot, give developers access to the best models available, and assume productivity will follow. That may have worked when usage was simpler and cost exposure was less visible. But GitHub Copilot is no longer just an autocomplete tool. It now includes chat, agent mode, code review, CLI-based workflows, model selection, custom context, larger context windows, and increasingly autonomous coding sessions.

That changes the governance problem.

The question is no longer:

“Should developers use AI?”

The better question is:

“How do we train developers to use the right model, with the right context, for the right task, at the right cost?”

That is where many companies are currently underprepared.

About this series

This is Part 1 of a 2-part series on AI operating models:

Part 1 (this post): The problems and challenges — what governance gaps exist, why token costs are spiralling, and what risks come with unstructured AI usage.
Part 2: Building an AI Operating Model — The practical framework — how to implement model routing, build enablement programmes, govern agent mode, and measure ROI.

This post focuses on the diagnosis. The next post provides the cure. Read Part 2 →

What changed on 1 June?

GitHub announced that from 1 June 2026, Copilot plans would transition to usage-based billing. Instead of counting only premium requests, Copilot now consumes GitHub AI Credits based on token usage. That includes input tokens, output tokens, and cached tokens, with cost varying by model. GitHub’s own explanation is clear: the cost of an interaction depends on both the model used and the number of tokens consumed.

The headline items are:

Area	What changed	Why it matters
Billing unit	Premium Request Units were replaced by GitHub AI Credits	Cost is now more directly linked to actual model usage
Token accounting	Input, output, and cached tokens are counted	Long prompts, long responses, and large context can materially affect cost
Model choice	Different models consume credits at different rates	A premium model used casually can become expensive quickly
Agentic workflows	Long-running agent sessions can consume significantly more	Autonomous multi-step work is not equivalent to a quick chat question
Enterprise controls	Budgets can be managed at enterprise, cost centre, and user levels	Governance is now a first-class requirement
Pooled usage	Business usage can be pooled across the organisation	Organisations need usage analytics, not just seat counts
Code review	Copilot code review can consume AI Credits and GitHub Actions minutes	AI use can now cross into adjacent platform cost lines

GitHub has also stated that larger context windows and higher reasoning levels consume more AI Credits per interaction. In June, GitHub announced support for one-million-token context windows and configurable reasoning levels in Copilot, while explicitly recommending that users keep the default context and reasoning level for everyday tasks and reserve extended context or higher reasoning for complex, multi-file problems.

That recommendation is important. It is effectively saying: capability has increased, but so has the need for discipline.

The real problem is not expensive models

There is a common argument that appears whenever AI tooling costs increase:

“We need to use cheaper models.”

That is too simplistic.

A colleague of mine provided an equally common counterargument:

“Cheaper models produce weaker results, and if we need twice as many prompts, we have not saved anything.”

That is also valid, but does not address the core issue.

The real issue is not expensive models. The real issue is unstructured model usage.

A premium model is absolutely justified when the task demands it. If someone is performing complex architecture analysis, debugging a difficult production issue, reviewing a security-sensitive change, or refactoring across a large codebase, then using a stronger model may be the most economical choice.

But that does not mean the same model should be used for every activity in the development lifecycle.

Using the strongest available model for every prompt is like using a senior architect to format YAML, write boilerplate comments, summarise a README, or explain a simple compiler error. It will work, but it is not a sensible operating model.

The goal should not be to minimise model cost in isolation.

The goal should be:

Lowest total cost to an acceptable outcome.

That includes:

Cost factor	Why it matters
Token cost	Direct consumption of AI Credits or equivalent platform spend
Developer time	Poor model choice can increase re-prompting and review effort
Context size	Unnecessary files, logs, and history increase token usage
Output quality	Weak output can create rework or false confidence
Review effort	AI-generated code still needs human validation
Defect risk	Poor recommendations can create production, security, or maintainability issues
Delivery speed	Good model routing can reduce cycle time without uncontrolled spend
Governance overhead	Untracked usage makes forecasting and accountability difficult

This is why the better discussion is not “cheap models vs expensive models.”

The better discussion is model routing.

Context windows are not free memory

One of the most important concepts companies need to teach is the context window.

A context window is the amount of information the model can consider at one time. It includes the developer’s prompt, previous conversation, system instructions, selected files, retrieved repository context, tool outputs, terminal output, and the model’s own responses. In long sessions, this can fill quickly.

Larger context windows are useful, but they are not automatically better. They allow the model to inspect more information, but they can also increase token consumption and cost.

GitHub’s June update is a good example. One-million-token context windows make it possible to work across larger codebases and longer documents. That is powerful for complex multi-file work. But GitHub also states that larger context windows and higher reasoning levels consume more AI Credits, and recommends using default context and reasoning for everyday tasks.

That should become a standard enterprise training point:

Context practice	Good behaviour	Poor behaviour
File selection	Attach only the files needed for the task	Attach the whole repository by default
Conversation length	Start a fresh session for a new task	Keep reusing one long, polluted chat
Terminal output	Paste only relevant errors and logs	Paste thousands of lines without filtering
Agent scope	Give a bounded task and stop condition	Tell the agent to “fix everything”
Reasoning level	Increase only for complex problems	Use maximum reasoning for routine edits
Context window size	Extend for large multi-file analysis	Use large context as the default
Documentation	Provide architecture notes and constraints	Assume the model will infer business rules

The context window should be treated like an engineering resource. It needs to be curated.

Prompt discipline matters more than most people think

A large percentage of AI waste comes from poor prompting, not poor models.

Developers often ask vague questions like:

Can you fix this?

or:

Make this better.

That forces the model to infer intent, inspect unnecessary context, make assumptions, and often generate broad changes that require another round of correction.

A better prompt has structure:

Context:
Goal:
Files involved:
Constraints:
Expected output:
Definition of done:
What not to change:

For example:

Context: This is a Python FastAPI service deployed to Azure Container Apps.
Goal: Add structured logging to the upload endpoint.
Files involved: app/api/upload.py, app/core/logging.py.
Constraints: Do not change the API contract. Do not introduce new dependencies.
Expected output: Minimal code changes and a short explanation.
Definition of done: Existing tests pass, new logging is included, and no secrets are logged.
What not to change: Do not modify authentication or request validation logic.

This kind of prompt reduces ambiguity. It also reduces the number of retries. That is good for cost, but more importantly, it is good for engineering quality.

Agent mode needs governance

Agentic workflows are where the economics change most dramatically.

A normal chat request might be one model interaction. An agentic workflow can involve many model calls, file reads, shell commands, searches, edits, test runs, and follow-up reasoning. That is the point of agent mode, but it also means agent mode needs stronger usage discipline.

Organisations should define explicit agent rules:

Rule	Policy
Start with a plan	Agent must propose a plan before modifying files for non-trivial work
Bound the task	The prompt must define scope, files, and expected outcome
Limit blast radius	Avoid broad instructions like “refactor the project” without decomposition
Require checkpoints	Developer reviews diffs after each meaningful step
Use tests deliberately	Agent should run targeted tests before broad test suites
Stop on uncertainty	Agent should ask when requirements are unclear instead of guessing
Use premium models selectively	Premium models are allowed for complex agentic work, but not every task
Track consumption	Teams should monitor usage by user, repo, workflow, and model

The key point is not to block agentic development. The key point is to make it auditable and repeatable.

Why companies are underprepared

Many organisations have invested in Copilot licences but have not invested enough in operating practices.

That gap is now visible.

A licence gives access. It does not teach:

how to choose the right model
how to manage context windows
how to write scoped prompts
how to use agent mode safely
how to evaluate AI-generated code
how to avoid leaking sensitive data into prompts
how to measure productivity against AI spend
how to decide when a premium model is justified
how to use AI for review rather than just generation

This is where engineering leadership needs to step in.

What happens next

The June 1 billing change is not just a pricing event. It is a maturity test.

Organisations that treat it only as a finance problem will respond with blunt restrictions. Organisations that treat it as an engineering operating model problem will build better habits.

The next post covers the practical framework: how to implement model routing, train developers, govern agent mode, and measure ROI properly.

The question is no longer “Should we use AI?”

The question is: “Are we structured enough to use it well?”

Ready for Part 2?

Continue to Part 2: Building an AI Operating Model →

Part 2 covers the implementation roadmap: model routing tables, enablement programmes, prompt templates, agent governance rules, and how to measure AI ROI properly.

Author’s note

This post was co-written with AI assistance. I used GitHub Copilot to help structure the argument, develop the tables and examples, and refine the prose. The core thesis and governance concerns are my own, but AI was valuable in articulating the problems clearly.

TLDR#

About this series#

What changed on 1 June?#

The real problem is not expensive models#

Context windows are not free memory#

Prompt discipline matters more than most people think#

Agent mode needs governance#

Why companies are underprepared#

What happens next#

Ready for Part 2?#

Author’s note#