TL;DR

A May 2026 Google whitepaper argues that software teams should focus less on model choice and more on the systems wrapped around AI coding agents. The paper says the shift from casual prompting to structured agentic engineering depends on tests, evals, CI gates, context and human judgment.

Google researchers Addy Osmani, Shubham Saboo and Sokratis Kartakis have published a whitepaper arguing that AI-assisted software development is moving from writing code toward directing machines through specs, tests and agent systems, a shift that matters because the paper says most agent performance comes from the engineering harness around the model rather than the model itself.

The paper, The New SDLC With Vibe Coding, says that as of early 2026, 85% of professional developers regularly use AI coding agents, 51% use them daily and roughly 41% of all new code is AI-generated. Those figures are presented by the paper as evidence that AI coding tools are no longer a side experiment for many software teams.

The central claim is that a working coding agent is not just a model. It is a model plus prompts, tools, context, hooks, sandboxes, rule files, observability, sub-agents and deployment checks. The authors describe the model as roughly 10% of the system, with the surrounding harness accounting for most of the agent’s behavior.

The paper distinguishes casual “vibe coding” from what it calls agentic engineering. In that framing, vibe coding means loose prompts, minimal review and acceptance of whatever appears to work. Agentic engineering means formal specifications, automated tests, evals, CI/CD gates and human review of architecture and risk.

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Verification Becomes The Bottleneck

The paper’s main practical message is that companies may misread AI coding failures if they focus only on model upgrades. According to the authors, many failures come from weak configuration: missing tools, vague rules, noisy context or absent checks.

That has budget and governance consequences. A team that treats AI code as cheap output may face higher downstream costs from repeated fix loops, maintenance work, security review and unclear ownership. The whitepaper argues that teams can lower long-term costs by investing earlier in specs, evals, context design and model routing.

The paper also points to a near-term management question: how engineering leaders measure AI productivity. It cites METR research indicating that some AI-assisted tasks took 19% longer, which supports the authors’ warning that verification can consume the time saved by generation.

Vibe Coding: Build Without Boundaries — Real Apps, Tools & Automations with AI for Every Skill Level (The AI Practitioner's Edge)

As an affiliate, we earn on qualifying purchases.

From Vibe Coding To Agents

The term “vibe coding” was popularized by Andrej Karpathy in February 2025 to describe a loose style of prompting where developers accept AI-generated code and feed errors back until something works. The Google paper argues that the phrase has since been stretched too far and should be treated as one end of a spectrum.

At the other end, the authors describe “agentic engineering,” where AI systems act as implementation engines within stronger software-development controls. The difference, according to the paper, is not whether a team uses AI, but how much structure surrounds the AI output.

The paper cites benchmark examples to support the harness argument. It says one agent moved from outside the Top 30 to the Top 5 on Terminal Bench 2.0 after changes to the harness while using the same model. It also cites a LangChain experiment that improved an agent’s score by 13.7 points through prompt, tool and middleware changes.

“Generation is solved; verification, judgment, and direction are the new craft.”
— Osmani, Saboo and Kartakis, Google whitepaper

How to Use AI in Test Automation: Practical Guide to Playwright, FlaUI, Cursor & AI Prompts for QA Engineers

As an affiliate, we earn on qualifying purchases.

Adoption Numbers Need Scrutiny

Several points remain unsettled. The adoption figures cited in the paper depend on the paper’s sources and definitions, including what counts as regular agent use and what qualifies as AI-generated code. The source material does not provide enough detail to independently evaluate the sampling behind those numbers.

The 10% model and 90% harness split is also presented as a rough framing, not a universal measurement. It may vary by task, team maturity, codebase quality, language, tooling and risk level.

It is also unclear how quickly most organizations can move from casual AI assistance to agentic engineering. Building specs, evals, routing systems and observability requires time, expertise and discipline that many teams may not yet have.

Automating DevOps with GitLab CI/CD Pipelines: Build efficient CI/CD pipelines to verify, secure, and deploy your code using real-life examples

As an affiliate, we earn on qualifying purchases.

Teams Test The Harness Thesis

The next test is whether engineering teams change their AI budgets and workflows in response. If the paper’s argument holds, buyers may spend less energy chasing a single preferred model and more on internal tooling, test coverage, eval systems, context pipelines and governance.

Vendors are also likely to compete on the surrounding platform. The analysis notes that while the paper’s concepts are broadly tool-agnostic, Google’s own on-ramps point toward Gemini, Jules and the Agent Development Kit. Readers should watch whether future benchmarks separate model capability from harness design more clearly.

Artificial Intelligence for Robotics: Build intelligent robots using ROS 2, Python, OpenCV, and AI/ML techniques for real-world tasks

As an affiliate, we earn on qualifying purchases.

Key Questions

What is the actual news development?

Google published a May 2026 whitepaper arguing that AI-assisted software development is shifting from writing code to directing and verifying AI agents through structured systems.

Does the paper say models no longer matter?

No. The paper argues that model choice is only one part of agent performance. It says prompts, tools, context, sandboxes, tests, evals and observability often determine whether an agent succeeds in real software work.

What is the difference between vibe coding and agentic engineering?

Vibe coding refers to casual prompting and minimal review. Agentic engineering refers to AI coding inside a more formal system of specs, automated tests, evals, CI/CD gates and human oversight.

Why does this matter for software teams?

If the paper is right, teams that skip verification may create hidden costs through rework, security issues and maintenance debt. Teams that invest in harnesses may get more reliable output from the same or cheaper models.

What remains unclear?

The exact strength of the 10% model claim is uncertain. The split is a useful framing, but the real balance likely depends on the task, codebase, team practices and evaluation method.

Source: Thorsten Meyer AI

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

10 Best Patio Furniture Conversation Set in 2026

Author

The Happy Loved Life Team

Share article

The model is only 10%

Verification Becomes The Bottleneck

Vibe Coding: Build Without Boundaries — Real Apps, Tools & Automations with AI for Every Skill Level (The AI Practitioner's Edge)

From Vibe Coding To Agents

How to Use AI in Test Automation: Practical Guide to Playwright, FlaUI, Cursor & AI Prompts for QA Engineers

Adoption Numbers Need Scrutiny

Automating DevOps with GitLab CI/CD Pipelines: Build efficient CI/CD pipelines to verify, secure, and deploy your code using real-life examples

Teams Test The Harness Thesis

Artificial Intelligence for Robotics: Build intelligent robots using ROS 2, Python, OpenCV, and AI/ML techniques for real-world tasks