Prompt engineering for senior software engineers

Q: Should prompts live in source control?

Yes for repeatable workflows: custom slash commands, /prompts/ directories, rules files like .cursorrules and CLAUDE.md. Throwaway exploratory prompts do not need to ship, but the patterns behind them do. Pasting the prompt into the PR description is a useful middle ground.

Prompt engineering for engineers at the senior level is prompt-as-spec discipline: writing a prompt precise enough that an agent can ship from it without you watching. It is a different skill from IDE autocomplete prompting. The core moves are bounding context, choosing plan mode versus immediate execution, chaining with verification, and treating the prompt itself as a reviewable artifact.

If you ship production code in 2026, you already prompt daily. This post is about the discipline that makes those prompts good enough that Cursor, Claude Code, or GitHub Copilot can run unattended on them and produce code you would actually merge.

What prompt engineering means at the senior level

Junior prompting looks like this: ask Claude to write a function, paste the result into a file, run it, paste the error back, repeat. It works for small scope. It collapses the moment the work spans more than one file.

Senior prompting looks different. The prompt is a spec. It names the files the agent should touch, the patterns it should follow, the tests that must pass, and the behaviors that must be preserved. The prompt is the unit of work, not the diff. The diff is a byproduct.

This shift matters because context windows have grown to over 2 million tokens (Gemini 2.5, Claude with extended context), but the engineering output does not scale linearly with the input. Bounded scopes still ship better than dumped repos. Signal-to-noise wins.

The senior craft is producing the same artifact for the human reviewer and the model. A good prompt reads like a small design doc. It reads like the kind of ticket you wish your PM wrote. If you can hand it to a mid-level engineer and they ship without questions, you can also hand it to an agent.

The prompt-as-spec template

Here is a template that survives real codebases. Steal it.

Goal: One sentence describing what we are building.

Context:
- File: path/to/main.ts (the function we are extending)
- File: path/to/test.ts (existing tests, must keep passing)
- Pattern: We use repository pattern with Drizzle ORM. See path/to/example.ts.

Constraints:
- Do not add new dependencies.
- Do not modify the public API of exportedFn.
- Use existing error class AppError, not throw new Error.

Acceptance:
- pnpm test src/auth passes.
- Add 2 new tests covering the success and 401 cases.
- Function signature: async function authorize(userId: string, scope: Scope): Promise<Result<User, AppError>>.

Stop conditions:
- If you find conflicting patterns in the codebase, stop and ask.
- If a test fails after 2 fix attempts, stop and report.

Notice what is missing: tone, role-playing ("you are a senior engineer"), reassurance. The agent does not need a pep talk. It needs concrete constraints, named files, and acceptance criteria.

Also notice the stop conditions. A senior prompt anticipates the failure modes the agent will hit and tells it what to do then, instead of letting it spiral.

Plan mode versus immediate: a decision rule

Claude Code has a plan mode. Cursor has a planning custom mode you can configure. Both produce a written plan before any code is written. The plan is itself reviewable.

Use plan mode when the scope is over 200 lines or touches more than 2 files. Use immediate execution when the change is under 50 lines and lives in one file. Between those, taste applies.

The reason is economic, not stylistic. A bad agent run on a multi-file refactor takes 5 to 15 minutes to revert cleanly. A plan-mode round trip takes 30 to 60 seconds. Plan mode catches the misunderstanding before the tokens go to wrong code.

Senior engineers read the plan critically. They look for the agent inventing a file that does not exist, missing an obvious dependency, or proposing a refactor pattern that conflicts with the codebase. The plan is the cheapest place to correct the agent. The diff is the most expensive.

Workflow	When to use	Tools	What separates a senior
Immediate prompt	Under 50 LOC, single file	Cursor inline, Copilot	Tight constraints, named file path
Plan mode	Multi-file or refactor	Claude Code plan, Cursor planning mode	Reads the plan critically before approving
Chained agent loop	Feature with tests	Claude Code, Aider	Verification step baked into the prompt
Rules + minimal prompt	Repeatable patterns	.cursorrules, CLAUDE.md	Knows what is invariant versus task-specific

Bounding context: the most undervalued senior skill

Cursor's @-mentions and Claude Code's folder reads let you point at exactly what the agent should consider. Use them aggressively.

A weak prompt says "fix the auth bug." A strong prompt says: @src/auth/middleware.ts @src/auth/middleware.test.ts @docs/auth-design.md and then describes the bug. Three named files beat the entire repo every time, because the model is not searching, it is reasoning.

Bigger context window does not equal better output. We have learned this the hard way across most teams shipping with Claude or Gemini. Dumping 1.5M tokens of repo and hoping retrieval surfaces the right thing produces vague code. Naming three files produces precise code.

The pattern: identify the 3 to 5 files the change actually depends on. Name them. Exclude everything else. Point at the test file you care about. If you do this, the prompt itself often shrinks to two sentences, because the constraints are encoded in the files you chose.

This is the same skill as scoping a ticket for a junior. It is a 2026 craft because the failure mode (ambiguous scope, agent invents files, agent over-reaches) is exactly the same failure mode juniors hit, just at agent speed.

Prompt chains and agent loops with verification

Single-shot prompts fail for anything over 100 lines. Chain instead.

A working chain for a real feature looks like this:

Spec prompt. Write the prompt-as-spec. Run it in plan mode.
Generate prompt. "Implement the plan above. Stop after each file." This gives you natural review checkpoints.
Verify prompt. "Run pnpm test src/auth. Report failures." The agent runs the suite, returns the output.
Fix prompt. "Fix the failing tests without changing behavior." Two attempts max.
Human review. You read the diff and the prompt history together.

The verification step is non-optional. It does not live in your memory. It lives in the prompt. Senior engineers bake run the tests and report into every agent loop, because they have learned the agent will silently produce code that does not run otherwise.

This is closer to writing a small CI pipeline than writing a chat message. That is the right mental model. You are orchestrating an agent, not chatting with one. Tools like Aider, Claude Code, and Cursor's agent mode all support this loop. The discipline is yours to bring.

For deeper coverage of the verification habits that separate AI-native engineers, our breakdown of AI-powered debugging workflows in 2026 maps closely to how senior ICs structure these chains in production.

Rules files: Cursor and Claude Code

Rules files are always-on context. They prepend to every prompt the agent sees, so anything that should be true across the entire codebase belongs there, not in the prompt.

Cursor reads .cursorrules (or the newer .cursor/rules/ directory) at the project root. Claude Code reads CLAUDE.md, with hierarchy support: a root file plus per-package files. The same file format is increasingly portable across tools, and patterns from Claude Code for production engineering port cleanly to Cursor with minor edits.

What goes in rules: invariants. What goes in prompts: the task.

A working .cursorrules looks like this:

# Stack
Next.js 15, TypeScript, Drizzle ORM, Postgres, pnpm.

# Conventions
- pnpm not npm, never yarn.
- No `any` in TypeScript. Use `unknown` and narrow.
- Tests live next to source as `*.test.ts`.
- Errors: throw `AppError` from `lib/errors.ts`, never `new Error()`.
- React Server Components by default; mark client components explicitly.

# Banned
- Default exports for components (named exports only).
- Direct database access from route handlers (use repositories in `lib/db/`).
- Inline SQL strings (use Drizzle query builder).

Three things to notice. It is short. It is concrete. It is written for the agent, not the team wiki. A good rules file is under 100 lines.

Anti-pattern: dumping your entire engineering handbook into CLAUDE.md. The agent skims long context. Keep rules to the things you want enforced on every diff.

Prompt-as-PR-description and prompt review

A pattern emerging on senior teams: paste the prompt that produced the code into the PR description.

This is not a vanity practice. It serves three purposes. First, reviewers learn what context the agent had, which makes the diff easier to read. Second, prompts become reproducible: anyone can rerun the agent with the same prompt and check the output. Third, the prompt itself becomes a reviewable artifact. If the prompt was vague, that is a bug in the work, even if the diff happens to look fine.

Some teams version prompts in a /prompts/ directory next to the code. Repeatable workflows ("scaffold a CRUD route", "write tests for this hook") become slash commands or reusable specs. Whether to commit prompts is a team choice. The discipline of treating them as artifacts is not.

In code review, this changes what reviewers ask. "What did you tell the agent" becomes a normal question. "The constraint section was missing" becomes a normal review comment. Reviewers reject for prompt clarity, not only diff quality.

This connects to the broader question of when to fine-tune versus prompt-engineer for production LLM features. The same logic that says "prompt + RAG handles 95% of LLM use cases" also says "prompt + rules + verification handles 95% of code generation use cases." Get the prompt right first.

How Cadence vets this

Every engineer on Cadence is AI-native by default. There is no non-AI-native option, no premium tier, no opt-in checkbox. AI-native is a baseline of the platform.

The voice interview specifically scores four things in this space: prompt-as-spec discipline, plan-mode judgment, verification habits, and rules-file fluency. Engineers walk through a real prompt they wrote in the last week, against a real codebase. The interviewer probes scope, constraints, and how the engineer handled the agent failing. 50 out of 100 unlocks bookings.

This matters because pricing tiers map to scope, not just years of experience:

Junior, $500/week. Executes well-bounded prompts with senior-written specs. Cleanup, dependency hygiene, integrations.
Mid, $1,000/week. Writes shippable specs for standard features. Owns the verification chain. Reasonable judgment on plan mode versus immediate.
Senior, $1,500/week. Owns prompt-as-spec discipline at scope. Reviews prompts in code review. Sets up rules files for the codebase. Mentors mid engineers on chain design.
Lead, $2,000/week. Designs the agent workflows for the team. Picks the tooling stack. Sets the rules-file standards across packages.

If you are evaluating an engineer (Cadence or otherwise) for senior prompt discipline, the signal is whether they reach for plan mode unprompted on a multi-file change, and whether they describe verification as a step in their own workflow before you ask. That is a 30-second tell.

What to do this week

Pick one prompt you ran today that produced messy output. Rewrite it as a spec: goal, files, constraints, acceptance, stop conditions. Run it. Compare.

Then audit your .cursorrules or CLAUDE.md. If it is over 200 lines, cut it in half. If it does not exist, write 30 lines covering stack, conventions, and banned patterns. Commit it.

If you want a structured way to evaluate where AI tooling fits versus a managed engineer for a given scope, our build / buy / book decision tool walks through the trade-offs in three minutes.

If you would rather skip the audit and book an engineer who already has prompt-as-spec discipline, get a Build / Buy / Book recommendation and see what fits the scope. Every Cadence engineer is vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings, with a 48-hour free trial and weekly billing.

FAQ

Is prompt engineering still a job in 2026?

Not as a standalone title for most teams. It is an IC skill embedded in senior software engineering, alongside testing, code review, and deployment. The pure prompt-engineer role survives only at frontier labs and a handful of vertical AI startups.

What is the difference between prompt engineering and AI engineering?

AI engineering builds systems that use models: RAG pipelines, agent frameworks, model routing, evaluation harnesses. Prompt engineering is the daily craft of writing prompts that ship code or power a feature. Senior engineers do both, but the prompt-as-spec skill is the one that scales their daily output.

How long should a production prompt be?

Long enough to be unambiguous, short enough that constraints do not conflict. 50 to 300 words is a healthy range for an agent-shippable spec. Under 50 is usually too vague. Over 500 usually means you are doing the agent's thinking inside the prompt instead of letting it reason.

Should prompts live in source control?

Yes for repeatable workflows: custom slash commands, /prompts/ directories, rules files like .cursorrules and CLAUDE.md. Throwaway exploratory prompts do not need to ship, but the patterns behind them do. Pasting the prompt into the PR description is a useful middle ground.

Do I need to learn prompt engineering if I am already a strong engineer?

If you ship production code in 2026, you prompt daily already. The question is not whether to learn it, but whether your prompts are good enough that an agent can run unattended on them. The discipline is what turns AI assistance from a mild speedup into a real multiplier on shippable scope.

All posts