May 5, 2026 · 11 min read · Cadence Editorial

What changes when you write code with AI

writing code with ai — What changes when you write code with AI
Photo by [Lukas Blazek](https://www.pexels.com/@goumbik) on [Pexels](https://www.pexels.com/photo/person-using-macbook-pro-574077/)

What changes when you write code with AI

Writing code with AI in 2026 changes the daily loop more than it changes the output. Pull requests get smaller, reviews shift from correctness to intent, planning takes longer, naming gets pickier, and the things that still matter (taste, system design, debugging weird bugs) matter more than they did three years ago.

This is the working engineer's version: here is what your PRs look like now, here is what your repo looks like, here is where you spend your time, and here is what the interview rubric should test for.

What writing code with AI actually means in 2026

The baseline. In early 2026, GitHub reported that 51% of code committed to its platform was generated or substantially assisted by an AI tool. The Stack Overflow Developer Survey put weekly AI-tool use at 65% of professional developers. Microsoft says 25% of its internal code comes from AI. The headline numbers stopped being interesting in 2024; the question now is what the working loop looks like.

"Writing code with AI" in 2026 means a five-step loop. Write a prompt that doubles as a spec. Generate a draft (30 to 400 lines). Verify it (run tests, read the diff, hit the failure case). Iterate. Ship. The toolkit settled around four products: Cursor for multi-file edits, Claude Code for architecture-grade work, GitHub Copilot for inline, and Aider or Continue for terminal-native workflows. Most senior engineers use two or three daily.

This is distinct from "vibe coding," the trend of describing an app in 45 words and shipping whatever the model returns. Vibe coding works for prototypes and demos. It does not work for systems you have to maintain, and the difference shows up the moment something breaks at 2am.

PRs get smaller and more frequent

The first thing that changes when a team adopts AI tools seriously is the shape of the pull request graph. Booking.com piloted AI tools across 700 developers in early 2026 and reported a 30% increase in merge requests within a quarter. Other teams with heavier rollouts have seen 40-60%.

The cause is mechanical. Each AI conversation produces one logical change. It is easier to generate three small PRs than one big one, because each small PR is a clean prompt with a clean verification step. Big PRs accumulate intent drift halfway through generation.

The tradeoff is review fatigue. Three PRs to review is more context-switching than one PR with three changes. Teams that handle this well adopt a single rule: one intent per PR, even when the AI happily generated 400 lines that touch four areas. If the prompt was "fix the webhook retry plus refactor the queue plus add a metric," split it into three before opening the PR.

If you are a founder, your velocity metric should change. PR throughput went up; feature throughput did not 3x. Use feature milestones, not PR counts.

Code review shifts from correctness to intent

In 2023, code review was largely a correctness check. Did this off-by-one work? Is the null case handled? Does this race? AI rarely makes those mistakes anymore. It makes intent mistakes.

A GitHub analysis published in late 2025 found AI-generated code was 1.7x more likely to contain major logic errors and 2.74x more prone to security flaws than human-written equivalents. The errors look correct. They compile, they pass the tests the model wrote, they read fluently. They just do something subtly different from what you asked for.

The new review checklist:

  1. Does this solve the actual problem, or a problem one prompt-hop adjacent?
  2. Are the system's invariants preserved? (Idempotency, ordering guarantees, transaction boundaries.)
  3. Are the tests honest, or did the model write tests that always pass?
  4. Did it hallucinate any APIs, libraries, or function signatures?
  5. What is the failure case, and what happens then?

Skim the diff, read the tests, run the failure case. That is the new loop. Line-by-line review of AI output is largely wasted effort; the quality of each line is high, the alignment between the diff and your intent is the thing that varies.

Planning becomes the expensive step

Generation got cheap. Thinking did not. The center of gravity in a feature shifted from implementation to planning, and most teams have not adjusted their estimates.

The METR study from mid-2025 is the most useful data point. Experienced developers on complex tasks felt 20% faster with AI. They were actually 19% slower. Part of the cause: AI made early stages feel frictionless, so developers skipped the planning they would have done by hand and paid for it in regen cycles.

The fix is to treat the prompt as the spec. The same artifact serves both: a function signature, three example inputs and outputs, and one edge case. Thirty minutes of writing that down saves four hours of "regenerate and pray." This is the discipline behind the AI-native engineering working style: planning got more important, not less.

Loop step2023 default2026 with AI
SpecTicket plus Slack threadPrompt with signature, 3 examples, 1 edge case
First draft1 to 3 days hand-written20 min generated, 40 min reviewed
PR size400 to 800 lines100 to 250 lines, 3x more PRs
Review focusCorrectnessIntent, invariants, tests
Hardest stepImplementationPlanning and verification

Naming, comments, and docs become load-bearing

This is the change most teams underestimate. AI reads your code to write the next chunk. Bad names produce bad output, immediately and visibly.

A function named processData(input) gets the next prompt wrong half the time. A function named processStripeWebhookRetry(event, attemptNumber) carries enough context that the model writes correct adjacent code on the first try. Comments that explain why (not what) suddenly pull their weight, because they are the cheapest way to inject context into a prompt.

The repo-level version of this is the instructions file. CLAUDE.md, .cursorrules, AGENTS.md, whatever the format. These files describe the codebase's conventions, the things the model should never do, the test commands, the style rules. Booking.com credited explicit instruction files with most of their 30% merge-request gain. Teams without them get the average AI experience: fast on the first prompt, frustrating by the fifth.

The side effect is that repos quietly become better-documented. Nobody wrote a memo declaring "we now document our codebase." It happened because undocumented code produces worse AI output, and the engineers got tired of the worse output.

The junior-to-mid task collapse

This is the change with the biggest economic implication, and the one founders feel first.

Tasks that used to take two weeks of junior engineering (cleanup, dependency hygiene, integrations with good docs, scaffolding a CRUD admin) now take two days of mid-level engineering. The junior tier did not disappear, but it shrank. The work that justified a full-time junior in 2022 is now an afternoon for someone with judgment.

You can see this in the labor market. Software developer employment in the 22 to 25 age bracket fell roughly 20% between 2022 and 2025 (Stanford analysis cited by MIT Technology Review). The entry-level squeeze is real, and it is not a recession story; it is a productivity story.

For founders, this changes how you book engineering work. The economics:

TierWeekly rateWhat it does well in 2026
Junior$500Cleanup, dependency upgrades, doc-writing, integrations with strong public docs
Mid$1,000Standard features end-to-end, refactors, test coverage, reasonable judgment
Senior$1,500Owns scope, architecture, complex refactors, performance, edge cases unprompted
Lead$2,000Architectural decisions, complex systems design, fractional CTO, scale

The old advice ("hire a junior for the cheap stuff first") inverts. Book a mid for the first feature. The mid will use AI to do the junior work in an hour and have time left to make actual judgment calls. Cadence's pricing tiers are locked at these levels precisely because we measure where AI moved the floor of each role.

If you want to think this through for your own project, our Build/Buy/Book decision tool walks through which scope belongs at which tier.

New failure modes you have to watch for

AI introduced a fresh set of bugs that did not exist in 2023. They are worth naming, because the pattern repeats.

Hallucinated APIs. The model invents a function that looks right and would be useful, but does not exist in the library. Common with newer SDKs and obscure ones. Fix: run the code before merging. Linters and type checkers catch most of it.

Security drift. Roughly 45% of AI-generated code contains vulnerabilities (hardcoded secrets, command injection, missing input validation). A 2026 audit of 1,645 Lovable-generated apps found 170 with critical security flaws. The model defaults to working code, not secure code; the verification step has to include a security pass.

Convention violations. AI ignores your repo's idioms unless told to follow them. A new developer joins the codebase and writes "the AI way," which is not your way. Fix: instructions file plus reviews that catch drift early.

The three-month black box. A feature shipped fast in week one becomes unreviewable in week twelve. Prompts are gone, the original engineer moved on, no human ever fully read the diff. Fix: PR descriptions that capture the prompt and the reasoning, not just the change.

Test theater. The model writes tests that always pass, especially when asked to "add tests" without specifying what they should prove. Read the assertions. If they have no negative case, they are decoration.

The teams that handle these well treat every AI commit the way you would treat a junior PR: read the diff, run the failure case, never trust silence.

What hasn't changed (and won't)

A lot, actually. The discourse oversells the shift. These four things are still entirely human work in 2026:

Taste. Knowing which abstraction is the right one for the next two years of your codebase. AI will happily generate a clever pattern that becomes a maintenance disaster in nine months. Picking the boring, durable choice is judgment.

System design. AI does not know your scaling story, your billing model, your team's skill distribution, your latency budget, or your compliance posture. It can implement an architecture once you have one. It cannot give you one.

Debugging hard bugs. Race conditions in production. Distributed state corruption. A memory leak that only shows up on Tuesday. AI is genuinely bad at this, because the bug is rarely visible in the code; it is in the interaction between the code and the runtime. Senior debugging skill is more valuable now, not less.

Code review judgment. Knowing when to push back, when the prompt was wrong, when the abstraction is leaking, when the test is theater. This is the skill that separates an engineer from a prompt operator.

These are also the four skills Cadence's voice interview actually measures. Every engineer who unlocks the platform has demonstrated all of them, plus daily fluency in Cursor, Claude Code, and Copilot. We do not have an "AI-native add-on tier"; it is the baseline of the platform.

What to do this week

Three concrete moves, in order of cost.

  1. Audit one repo for an instructions file. If you do not have a CLAUDE.md, .cursorrules, or AGENTS.md, write one. Twenty lines is enough to start: how to run tests, what the lint rules are, what the model should never do. Measure the difference in PR quality over the next two weeks.

  2. Run one PR through "review for intent only." Skip line-by-line. Read the description, read the tests, run the failure case. See if you catch more or less than your usual review caught. Most teams catch more.

  3. Re-time-box your next feature. Put 30 minutes into a written spec before opening Cursor. Note how long the generation phase takes. Compare against your last similar feature. The numbers will surprise you.

If your team does not have an AI-native engineer to model these habits from, that is a fixable problem. Every engineer on Cadence is vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings, with a 48-hour free trial so you can verify the working style on your own codebase before the first invoice. Founders who are sizing the next sprint can run their scope through our Build/Buy/Book decision tool to get a per-feature recommendation.

Try it: Decide your next feature with a 2-minute Build/Buy/Book recommendation, or book a vetted AI-native engineer with a 48-hour free trial. Weekly billing, replace any week, no notice period. Get the recommendation.

FAQ

Is writing code with AI faster than writing it by hand?

On average, yes for simple tasks (20 to 55% faster in early studies from GitHub, Google, and Microsoft). But the METR 2025 study found experienced developers were actually 19% slower on complex work despite feeling 20% faster. Speed depends heavily on task type and how disciplined the verification loop is. Skip planning and you pay for it in regen cycles.

What is the biggest risk of writing code with AI?

Security drift. Roughly 45% of AI-generated code contains vulnerabilities: hardcoded secrets, missing input validation, command injection. The fix is treating every AI commit like a junior PR: read the diff, run the failure case, never trust silence. A 2026 audit of 1,645 Lovable-generated apps found 170 with critical security flaws.

Should juniors still learn to code without AI?

Yes. Fundamentals (data structures, debugging, system design, networking, SQL) are how you recognize when AI is wrong. The engineers who only know how to prompt cannot debug what they shipped. AI is a force multiplier on judgment, not a substitute for it. If you want a deeper take on the labor question, our piece on whether AI will replace software developers walks through the data.

What is the best AI coding tool in 2026?

There is no single answer, and that is the point. Cursor handles multi-file refactors well. Claude Code is the right choice for long-running architectural tasks and complex codebases. GitHub Copilot is the strongest inline assistant. Most senior engineers use two or three daily, picking the right tool per task instead of forcing one everywhere. Tool fluency means knowing which one to reach for.

How do I know if a developer is actually AI-native?

Ask them to walk through a recent PR they generated with AI. What was the prompt? What did they verify? What did they reject? AI-native engineers answer in specifics: the model invented a function, I caught it in the type check, I rewrote the test because it was tautological. Non-AI-native engineers describe the process as magic, or refuse to talk about it. The gap is obvious within five minutes of a real conversation.

All posts