
AI for legacy code understanding works best as a guided tour, not an oracle. Point Cursor or Claude Code at a single function, ask "explain this, then map what calls it and what it touches," and verify every claim against the AST. The model gives you a 70% sketch in minutes; you spend the next hour confirming the parts that matter. That is the whole job.
The failure mode is treating the LLM as ground truth on a codebase it has never seen. Legacy code with weak types, dynamic dispatch, reflection, and config-driven behavior is exactly where models invent APIs that sound plausible and do not exist. The discipline below is how AI-native engineers actually work through a million-line monolith without shipping a regression.
Ninety percent of legacy archaeology reduces to four prompts. Memorize them.
You run these in sequence, not in parallel. Each answer feeds the next prompt. A model that has just traced the callers will give you a much better blast-radius answer than one starting cold.
AI models trained on public GitHub have seen a lot of React, Django, and Rails. They have not seen your 2014 Symfony 1.x app, your CFML report generator, or the in-house ORM your fourth engineer wrote and never documented. Three failure modes show up over and over.
Hallucinated internal APIs. Ask Claude what UserRepository.findActiveByCohort does in a typed Java repo and you get a real answer. Ask the same question in a PHP repo where the method is created at runtime via __call, and you get a fluent fabrication. The model pattern-matches against public idioms and ships you a function signature that does not exist.
Confident wrong tracing in dynamic languages. Python, Ruby, JavaScript pre-TypeScript, and old PHP all let the receiver of a method call be unknowable at static analysis time. Cursor's symbol index gives up; the model falls back to text matching and tells you process_payment is called from three places when it is actually called from thirty via a registry.
Context-window starvation. A 50,000-line file (yes, those exist) does not fit. The model summarizes the first 8,000 tokens, hallucinates the rest, and produces an "overview" that omits the actual entry point. The fix is chunking by symbol, not by line range.
A 2025 GitClear analysis of 211 million lines of AI-assisted code found that "code churn" (lines reverted within two weeks) doubled compared to 2020 baselines. The bulk of that churn was on AI-suggested edits to code the model did not actually understand. Legacy archaeology is the front line of that problem.
There is no single tool for this. The good news is the three workhorses each have a clear niche.
| Tool | Best for | Where it struggles | Cost |
|---|---|---|---|
| Cursor | Mid-size repos (≤500k LOC), tight refactor loops, file-level edits with AST awareness | Monorepos over a few million LOC; the index gets slow and stale | $20/mo Pro, $40/mo Business |
| Claude Code | Whole-repo navigation, multi-file reasoning, architectural Q&A, scripted exploration via the CLI | No persistent IDE state; you re-prime context each session | $20/mo Pro, $200/mo Max |
| Sourcegraph Cody | Huge monorepos (10M+ LOC), cross-repo search, enterprise compliance | Pricier; setup overhead; less polished edit loop than Cursor | From $9/user/mo, enterprise quote-based |
For most teams the practical answer is Cursor as the daily driver and Claude Code as the second opinion when something feels off. Sourcegraph enters the picture when your repo is large enough that Cursor's local index can no longer keep up, which is a real but specific problem.
Aider and Continue both deserve a mention for self-hosted workflows; both are excellent if you want to run a local model against a private codebase. Neither has matched Cursor's editing UX yet, but the gap is closing.
The single highest-leverage habit for legacy work is writing a CLAUDE.md (or .cursor/rules/ equivalent) at the repo root that captures the architectural shape of the code in 200 to 500 lines.
What goes in it:
lib/legacy/* without running the full integration suite, period"Every prompt session starts richer because the model reads this file first. A senior engineer on a complex refactor reports a 30 to 40 percent reduction in "the model went down the wrong path" episodes after introducing a CLAUDE.md. The file pays for itself in a week and keeps paying forever.
You do not write it by hand. You ask Claude Code to draft it after spending an afternoon exploring the repo with you, then you edit. That is the right division of labor.
Suppose you inherit a processCharge function in a Ruby on Rails 4 monolith. The original author left. The function is 380 lines, calls into three other services, and is suspected to be the cause of a duplicate-charge bug that hits once a week.
The bad approach: paste the function into ChatGPT and ask "what does this do." You get a confident summary that is 80% right, and the 20% wrong part is exactly the bug.
The good approach, in order:
app/services/billing/process_charge.rb. State the function's pre-conditions, post-conditions, and every error path. Be explicit about what happens on each return."ProcessCharge.run in this repo. For each, tell me what triggers that code path."That sequence takes 45 to 90 minutes and produces a real understanding of a function you were ready to declare cursed. Compare with the old way (printf-debugging in staging for a week) and the ROI is obvious.
Three verification habits separate engineers who ship from engineers who debug their own AI.
Run the tests, always. Before you trust any model summary of behavior, find or write a test that exercises the path. If the test passes with the function commented out, the model lied about the function being important. This sounds obvious. It is the most-skipped step in the workflow.
Click through the references. Every IDE has "go to definition" and "find all references." When the model says "this is called from OrderController#create," put your cursor on the symbol and check. Five seconds. Catches 80% of hallucinations.
Ask the same question two ways. Prompt Claude with "list the side effects of this function," then ask Cursor "what does this function modify in the database." Compare the answers. Disagreement is a signal; investigate.
Teams that pair AI-assisted legacy work with rigorous PR review catch the remaining mistakes. Our perspective on AI-native PR review workflows is the natural complement to this archaeology habit.
Different repos call for different tactics. Here is the honest matrix.
| Approach | Best when | Avoid when |
|---|---|---|
| Cursor + manual verification | 10k to 500k LOC, modern stack, one team, fast iteration | Monorepo with 20+ services; the index will not keep up |
| Claude Code + CLAUDE.md | Any size, when you want scripted exploration and a persistent shared context | You need rich IDE features; the CLI is intentionally minimal |
| Sourcegraph Cody + LLM | True monorepo (5M+ LOC), enterprise security review needs | Small repos; the overhead is not worth it |
| AST tools + LLM second-pass | Statically typed languages (TS, Java, Go, Rust), high-stakes refactors | Dynamic codebases where the AST does not capture runtime behavior |
| No AI, just grep and a notebook | Truly small functions, or codebases with known LLM hallucination rates | Anything over 200 lines of cross-cutting concern |
The "no AI" row is not a joke. For a 30-line utility function in a typed codebase, opening Cursor and prompting is slower than just reading the code. Use the right tool.
A common founder question: is it cheaper to pay an AI-native engineer to map a legacy module, or to rewrite it from scratch?
Almost always the mapping. Concretely, a mid-tier Cadence engineer at $1,000/week can produce a thorough architectural map of a 100k-line module (CLAUDE.md, dependency graph, risk register, test gap analysis) in three to five days. Rewriting that same module typically runs four to eight weeks at the same rate.
The mapping does not just save money. It tells you whether a rewrite is actually warranted, or whether targeted refactors are the right call. We have written separately about the discipline of AI-assisted migration projects, which is the natural follow-on when mapping reveals a true rewrite candidate.
Three honest statements about the platform.
First, every engineer on Cadence is AI-native by default. The voice interview specifically scores Cursor and Claude Code fluency, including the prompt-as-spec discipline and verification habits described above. There is no non-AI-native option, and the 12,800-engineer pool is filtered against that bar before bookings unlock.
Second, legacy archaeology is a real fit for the mid and senior tiers. A mid engineer at $1,000/week handles standard tracing and CLAUDE.md drafting; a senior at $1,500/week owns scope on a million-line refactor and tells you which parts to leave alone. Lead at $2,000/week is the right call when the question is "should we rewrite this entire system."
Third, you can verify the fit before committing. The 48-hour free trial exists precisely because legacy work is hard to scope until someone is in the code. Book a senior on Monday, get a draft architectural map by Wednesday, decide whether to continue. If you want a structured way to decide between rewriting, refactoring, or pausing, our build/buy/book tool walks you through the trade-offs in a few minutes.
Pick one legacy function you have been afraid to touch. Open Cursor or Claude Code. Run the four-question sequence (contract, callers, dependencies, blast radius). Write the answers into a docs/legacy/ markdown file. Repeat for five functions over a week.
By the end of the week you have a small legacy atlas, a working prompt template, and a clear sense of where the AI lies in your specific codebase. That is the foundation for any larger refactor. The companion read on reducing AI coding mistakes in production is worth keeping open while you work.
Ready to stop guessing about your legacy code? Book a Cadence engineer for a 48-hour trial and have a real architectural map in three days. Weekly billing, no notice, replace any week.
No. Current models can produce a strong first-pass summary for any well-typed codebase under a few million lines, but they hallucinate internal APIs in dynamically typed legacy code and lose track of runtime behavior they cannot see. Treat AI as a guided-tour generator that needs a human verifier on every claim that affects a production change.
Cursor for mid-size repos (up to about 500k lines) thanks to its strong codebase index and edit loop; Claude Code for whole-repo reasoning and scripted exploration via the CLI; Sourcegraph Cody when the repo exceeds a few million lines and the local index strategy breaks down. Most teams use Cursor daily and Claude Code as a second opinion.
Three habits: index the repo with a real symbol indexer before prompting (do not let the model text-match), maintain a CLAUDE.md file that names the weird parts of your codebase up front, and click through every claimed reference in your IDE before trusting it. Statically typed languages with strict mode also dramatically reduce hallucination rates.
A CLAUDE.md is a 200 to 500 line markdown file at your repo root that captures architectural notes, weird parts, naming conventions, and known traps. The model reads it first in every session and gives meaningfully better answers because of it. If you have a codebase older than three years, yes, you need one. Have an AI-native engineer draft it after an afternoon of exploration; edit by hand.
A mid-tier engineer working with Cursor or Claude Code can produce a thorough architectural map (CLAUDE.md, dependency graph, risk register, test gap analysis) of a 100,000-line module in three to five working days. The same work without AI assistance historically took two to four weeks; the same work without verification habits produces unreliable maps that cause shipped regressions.
Senior frontend developer at withRemote. Writes on React, Next.js, performance budgets, and modern web tooling.