How to use AI for legacy code understanding

AI for legacy code understanding works best as a guided tour, not an oracle. Point Cursor or Claude Code at a single function, ask "explain this, then map what calls it and what it touches," and verify every claim against the AST. The model gives you a 70% sketch in minutes; you spend the next hour confirming the parts that matter. That is the whole job.

The failure mode is treating the LLM as ground truth on a codebase it has never seen. Legacy code with weak types, dynamic dispatch, reflection, and config-driven behavior is exactly where models invent APIs that sound plausible and do not exist. The discipline below is how AI-native engineers actually work through a million-line monolith without shipping a regression.

The four questions that map any legacy function

Ninety percent of legacy archaeology reduces to four prompts. Memorize them.

"Explain this function in plain English. What is its contract, what does it return on the unhappy path, and what are its non-obvious side effects?" The contract question forces the model to distinguish what the function promises from what it accidentally does.
"What calls this function? Search the repo and list each caller with a one-line reason." Cursor's codebase index and Claude Code's repo navigation both answer this well; without an index, you are guessing.
"What does this function depend on? Trace the imports, the global state it reads, the env vars, the database tables, and any feature flags." This is the side-effects question, expanded.
"What would break if I removed this function tomorrow? Be specific about runtime errors, silent data drift, and downstream contracts." This is the regression budget. If the model cannot answer concretely, you do not understand the function well enough to change it.

You run these in sequence, not in parallel. Each answer feeds the next prompt. A model that has just traced the callers will give you a much better blast-radius answer than one starting cold.

Why legacy code breaks AI assistants in specific ways

AI models trained on public GitHub have seen a lot of React, Django, and Rails. They have not seen your 2014 Symfony 1.x app, your CFML report generator, or the in-house ORM your fourth engineer wrote and never documented. Three failure modes show up over and over.

Hallucinated internal APIs. Ask Claude what UserRepository.findActiveByCohort does in a typed Java repo and you get a real answer. Ask the same question in a PHP repo where the method is created at runtime via __call, and you get a fluent fabrication. The model pattern-matches against public idioms and ships you a function signature that does not exist.

Confident wrong tracing in dynamic languages. Python, Ruby, JavaScript pre-TypeScript, and old PHP all let the receiver of a method call be unknowable at static analysis time. Cursor's symbol index gives up; the model falls back to text matching and tells you process_payment is called from three places when it is actually called from thirty via a registry.

Context-window starvation. A 50,000-line file (yes, those exist) does not fit. The model summarizes the first 8,000 tokens, hallucinates the rest, and produces an "overview" that omits the actual entry point. The fix is chunking by symbol, not by line range.

A 2025 GitClear analysis of 211 million lines of AI-assisted code found that "code churn" (lines reverted within two weeks) doubled compared to 2020 baselines. The bulk of that churn was on AI-suggested edits to code the model did not actually understand. Legacy archaeology is the front line of that problem.

The three-tool stack for legacy work in 2026

There is no single tool for this. The good news is the three workhorses each have a clear niche.

Tool	Best for	Where it struggles	Cost
Cursor	Mid-size repos (≤500k LOC), tight refactor loops, file-level edits with AST awareness	Monorepos over a few million LOC; the index gets slow and stale	$20/mo Pro, $40/mo Business
Claude Code	Whole-repo navigation, multi-file reasoning, architectural Q&A, scripted exploration via the CLI	No persistent IDE state; you re-prime context each session	$20/mo Pro, $200/mo Max
Sourcegraph Cody	Huge monorepos (10M+ LOC), cross-repo search, enterprise compliance	Pricier; setup overhead; less polished edit loop than Cursor	From $9/user/mo, enterprise quote-based

For most teams the practical answer is Cursor as the daily driver and Claude Code as the second opinion when something feels off. Sourcegraph enters the picture when your repo is large enough that Cursor's local index can no longer keep up, which is a real but specific problem.

Aider and Continue both deserve a mention for self-hosted workflows; both are excellent if you want to run a local model against a private codebase. Neither has matched Cursor's editing UX yet, but the gap is closing.

The CLAUDE.md pattern: persistent context that pays back forever

The single highest-leverage habit for legacy work is writing a CLAUDE.md (or .cursor/rules/ equivalent) at the repo root that captures the architectural shape of the code in 200 to 500 lines.

What goes in it:

The runtime entry points (which file does the server actually start from)
The folder map and what each top-level directory owns
The "weird parts" (the 2014 ORM, the homegrown queue, the feature-flag service that lies)
Naming conventions and the reason they exist (often historical, often non-obvious)
The boundaries between modules: what is a public API and what is internal
Known traps: "never edit lib/legacy/* without running the full integration suite, period"

Every prompt session starts richer because the model reads this file first. A senior engineer on a complex refactor reports a 30 to 40 percent reduction in "the model went down the wrong path" episodes after introducing a CLAUDE.md. The file pays for itself in a week and keeps paying forever.

You do not write it by hand. You ask Claude Code to draft it after spending an afternoon exploring the repo with you, then you edit. That is the right division of labor.

A worked example: tracing a 12-year-old payment function

Suppose you inherit a processCharge function in a Ruby on Rails 4 monolith. The original author left. The function is 380 lines, calls into three other services, and is suspected to be the cause of a duplicate-charge bug that hits once a week.

The bad approach: paste the function into ChatGPT and ask "what does this do." You get a confident summary that is 80% right, and the 20% wrong part is exactly the bug.

The good approach, in order:

Index first. Open the repo in Cursor or run Claude Code with the repo as the working directory. Wait for the symbol index to finish.
Ask for the contract. "Read app/services/billing/process_charge.rb. State the function's pre-conditions, post-conditions, and every error path. Be explicit about what happens on each return."
Map the callers. "Now find every call site for ProcessCharge.run in this repo. For each, tell me what triggers that code path."
Trace the side effects. "List every database write, external API call, log statement, and metric this function emits. Group them by which branch they happen on."
Diff the suspicion. "I suspect a duplicate-charge bug. Given what you just mapped, where could a charge happen twice? Walk me through the concurrency model."
Verify against the AST. Open the call-graph view in your IDE. Cross-check at least three of the model's claimed call sites. If any are wrong, throw out the trace and re-prompt with a smaller scope.

That sequence takes 45 to 90 minutes and produces a real understanding of a function you were ready to declare cursed. Compare with the old way (printf-debugging in staging for a week) and the ROI is obvious.

When AI is wrong, and how to catch it fast

Three verification habits separate engineers who ship from engineers who debug their own AI.

Run the tests, always. Before you trust any model summary of behavior, find or write a test that exercises the path. If the test passes with the function commented out, the model lied about the function being important. This sounds obvious. It is the most-skipped step in the workflow.

Click through the references. Every IDE has "go to definition" and "find all references." When the model says "this is called from OrderController#create," put your cursor on the symbol and check. Five seconds. Catches 80% of hallucinations.

Ask the same question two ways. Prompt Claude with "list the side effects of this function," then ask Cursor "what does this function modify in the database." Compare the answers. Disagreement is a signal; investigate.

Teams that pair AI-assisted legacy work with rigorous PR review catch the remaining mistakes. Our perspective on AI-native PR review workflows is the natural complement to this archaeology habit.

Comparison of approaches: what works at what scale

Different repos call for different tactics. Here is the honest matrix.

Approach	Best when	Avoid when
Cursor + manual verification	10k to 500k LOC, modern stack, one team, fast iteration	Monorepo with 20+ services; the index will not keep up
Claude Code + CLAUDE.md	Any size, when you want scripted exploration and a persistent shared context	You need rich IDE features; the CLI is intentionally minimal
Sourcegraph Cody + LLM	True monorepo (5M+ LOC), enterprise security review needs	Small repos; the overhead is not worth it
AST tools + LLM second-pass	Statically typed languages (TS, Java, Go, Rust), high-stakes refactors	Dynamic codebases where the AST does not capture runtime behavior
No AI, just grep and a notebook	Truly small functions, or codebases with known LLM hallucination rates	Anything over 200 lines of cross-cutting concern

The "no AI" row is not a joke. For a 30-line utility function in a typed codebase, opening Cursor and prompting is slower than just reading the code. Use the right tool.

The economics: what legacy AI work actually costs

A common founder question: is it cheaper to pay an AI-native engineer to map a legacy module, or to rewrite it from scratch?

Almost always the mapping. Concretely, a mid-tier Cadence engineer at $1,000/week can produce a thorough architectural map of a 100k-line module (CLAUDE.md, dependency graph, risk register, test gap analysis) in three to five days. Rewriting that same module typically runs four to eight weeks at the same rate.

The mapping does not just save money. It tells you whether a rewrite is actually warranted, or whether targeted refactors are the right call. We have written separately about the discipline of AI-assisted migration projects, which is the natural follow-on when mapping reveals a true rewrite candidate.

Where Cadence fits

Three honest statements about the platform.

First, every engineer on Cadence is AI-native by default. The voice interview specifically scores Cursor and Claude Code fluency, including the prompt-as-spec discipline and verification habits described above. There is no non-AI-native option, and the 12,800-engineer pool is filtered against that bar before bookings unlock.

Second, legacy archaeology is a real fit for the mid and senior tiers. A mid engineer at $1,000/week handles standard tracing and CLAUDE.md drafting; a senior at $1,500/week owns scope on a million-line refactor and tells you which parts to leave alone. Lead at $2,000/week is the right call when the question is "should we rewrite this entire system."

Third, you can verify the fit before committing. The 48-hour free trial exists precisely because legacy work is hard to scope until someone is in the code. Book a senior on Monday, get a draft architectural map by Wednesday, decide whether to continue. If you want a structured way to decide between rewriting, refactoring, or pausing, our build/buy/book tool walks you through the trade-offs in a few minutes.

What to do next

Pick one legacy function you have been afraid to touch. Open Cursor or Claude Code. Run the four-question sequence (contract, callers, dependencies, blast radius). Write the answers into a docs/legacy/ markdown file. Repeat for five functions over a week.

By the end of the week you have a small legacy atlas, a working prompt template, and a clear sense of where the AI lies in your specific codebase. That is the foundation for any larger refactor. The companion read on reducing AI coding mistakes in production is worth keeping open while you work.

Ready to stop guessing about your legacy code? Book a Cadence engineer for a 48-hour trial and have a real architectural map in three days. Weekly billing, no notice, replace any week.

FAQ

Can AI fully understand a legacy codebase without human help?

No. Current models can produce a strong first-pass summary for any well-typed codebase under a few million lines, but they hallucinate internal APIs in dynamically typed legacy code and lose track of runtime behavior they cannot see. Treat AI as a guided-tour generator that needs a human verifier on every claim that affects a production change.

What is the best AI tool for understanding legacy code in 2026?

Cursor for mid-size repos (up to about 500k lines) thanks to its strong codebase index and edit loop; Claude Code for whole-repo reasoning and scripted exploration via the CLI; Sourcegraph Cody when the repo exceeds a few million lines and the local index strategy breaks down. Most teams use Cursor daily and Claude Code as a second opinion.

How do I prevent AI from hallucinating internal APIs in my legacy repo?

Three habits: index the repo with a real symbol indexer before prompting (do not let the model text-match), maintain a CLAUDE.md file that names the weird parts of your codebase up front, and click through every claimed reference in your IDE before trusting it. Statically typed languages with strict mode also dramatically reduce hallucination rates.

What is a CLAUDE.md file and do I need one?

A CLAUDE.md is a 200 to 500 line markdown file at your repo root that captures architectural notes, weird parts, naming conventions, and known traps. The model reads it first in every session and gives meaningfully better answers because of it. If you have a codebase older than three years, yes, you need one. Have an AI-native engineer draft it after an afternoon of exploration; edit by hand.

How long does it take to map a legacy module with AI?

A mid-tier engineer working with Cursor or Claude Code can produce a thorough architectural map (CLAUDE.md, dependency graph, risk register, test gap analysis) of a 100,000-line module in three to five working days. The same work without AI assistance historically took two to four weeks; the same work without verification habits produces unreliable maps that cause shipped regressions.

Akashdeep Singh

Senior Frontend Developer

Senior frontend developer at withRemote. Writes on React, Next.js, performance budgets, and modern web tooling.

All posts