
To mock external APIs in tests, intercept HTTP at the boundary (not inside your business logic), use Mock Service Worker (MSW) as the default for both frontend and Node, and pin every fixture to a specific upstream API version. Then run one nightly integration test against the real sandbox so contract drift fails loud, not silent.
That is the whole playbook in three sentences. The rest of this post is why each of those choices wins in 2026, what to reach for when MSW isn't the right tool, and the five pitfalls that turn a green CI into a red production incident.
A mock replaces a network call with a deterministic stand-in. When your code says "POST to api.stripe.com/v1/charges," the mock returns a canned response without leaving the test runner. Tests get faster, isolated, and reproducible.
The mistake we see most often: teams mock at the wrong layer. They stub their own StripeClient class and assert that client.createCharge was called with the right arguments. That tests the wrapper, not the integration. The mock never exercises serialization, retries, auth headers, or error parsing, so the first real call in production is also the first time those code paths run.
The fix is to intercept closer to the wire. MSW intercepts at the fetch/XHR layer. nock intercepts at Node's http module. Both run your actual HTTP client end to end, then catch the request before it leaves the process. That extra inch of realism catches an entire class of bugs the wrapper-stub approach hides.
A typical SaaS app today depends on 6 to 10 external APIs out of the box: an auth provider, a payment processor, a search index, an LLM, an email vendor, an object store, an analytics pipeline, and usually two or three more. Every one of those is a network call that can flake, rate-limit, or rewrite its response shape.
AI-pair-programmed code ships faster than humans can write integration tests. A senior engineer in Cursor or Claude Code can land 4 to 6 PRs in a day. If each PR triggers a CI run that hits real third-party APIs, you're paying per token, per webhook, and per minute of wall clock. Mocks are the only way to keep CI under 5 minutes at that throughput.
Third-party APIs also version more aggressively now. Stripe rolls a new API version every 3 to 6 months. OpenAI deprecates models on rolling 6-month windows. If your mock fixtures were captured against last year's payload, your tests are lying to you with a smile.
Not every external call deserves the same treatment. Here's the four-way decision we use:
Most posts on this keyword pretend mocking is the only answer. It isn't. Mocking is the fast lane; the other three are guardrails that catch what the fast lane misses.
Mock Service Worker v2 (GA late 2024) is the cleanest default for almost every team. The same handlers work in the browser (for component tests in Vitest or Jest) and in Node (for API-route and integration tests). One source of truth, two environments. You write a handler once and your frontend, backend, and Storybook all see the same mocked Stripe.
// mocks/handlers/stripe.ts
import { http, HttpResponse } from 'msw'
export const stripeHandlers = [
http.post('https://api.stripe.com/v1/charges', async ({ request }) => {
const body = await request.text()
if (!body.includes('amount=')) {
return HttpResponse.json({ error: { code: 'parameter_missing' } }, { status: 400 })
}
return HttpResponse.json({ id: 'ch_test_123', status: 'succeeded' })
}),
]
Organize handlers by upstream domain, not by feature: mocks/handlers/stripe.ts, mocks/handlers/openai.ts, mocks/handlers/slack.ts. Each file owns the contract for one vendor. When Stripe ships a new API version, you change one file. When a feature uses three vendors, it composes three handler sets.
Every recorded fixture or mock response should be tagged with the upstream version it came from. For Stripe, that's the Stripe-Version header (e.g., 2024-06-20). For OpenAI, it's the model and API path. Store the version in the filename or a sidecar JSON. This is the single biggest lever against silent contract drift, and it's what the Stripe webhook handler playbook hammers on too.
The 200 OK case is the easy one. The bugs hide in 429s, 401s, 500s, and the partial-success cases. Generate at least one handler per common error shape: rate limits, expired tokens, malformed payloads, server errors. Bonus points: include a handler that returns a 502 once, then a 200, so you exercise your retry logic.
CI runs against mocks. A separate nightly job hits the real sandbox (Stripe test mode, OpenAI's free tier, Auth0 dev tenant) and diffs the response schema against your mock fixtures. When the diff fails, regenerate the fixture before the next merge. This is the same pattern that makes a good integration test setup in CI trustworthy week over week.
This is the 2026 addition nobody else writes about. The marginal cost of writing a high-fidelity mock handler used to be 30 minutes per endpoint. With Cursor or Claude Code, you can:
The economics flip. Mock coverage that wasn't worth the engineering hours last year is now a 60-second task.
| Tool | Best for | Layer | Trade-off |
|---|---|---|---|
| MSW | Default for frontend + Node | fetch / XHR | Slight setup cost; pays back because handlers are shared across test, dev, and Storybook |
| nock | Pure Node, low-level control | Node http module | Brittle when libraries use fetch instead of http (a growing problem in 2026) |
| Polly.js / VCR | Recording and replaying real responses | HTTP | Fixtures rot fast; you need explicit rerecord discipline |
| Pact | Multi-team microservices in your own org | Contract layer | Needs a Pact Broker and buy-in from the upstream team; not a 1-day install |
| WireMock | Polyglot teams; shared standalone mock server | HTTP server | Heaviest setup; shines as a shared dev environment, not as a per-test mock |
If you have to pick one, pick MSW. If you're on a Java or Go-heavy stack and want a language-agnostic shared mock, WireMock is the better default.
These are the five mistakes we see most often, paired with what they look like at 3am.
If you want a no-BS audit of your own test infrastructure (mocks, fixtures, CI flake rate, contract coverage), our Ship or Skip tool grades it in about 90 seconds. It's free and tells you the truth.
Best practices have an ROI curve. Respect it. There are cases where mocking external APIs is overengineering:
This is also a good frame for managing technical debt in a startup: every test infrastructure decision is a tradeoff against the speed of shipping. The right answer is rarely "mock everything."
Mock infrastructure is cross-cutting: it touches CI, fixtures, error envelopes, retry logic, and contract drift. It's the wrong place to send a junior. It's also too important to leave to whoever last touched a test.
On Cadence, this kind of work usually lands with the Senior tier ($1,500/week). Senior engineers own scope and ship the playbook end to end: pick MSW, scaffold handlers per vendor, wire up the nightly contract check, document the rerecord process, and hand it back to the team. Every engineer on Cadence is AI-native by baseline (Cursor, Claude Code, and Copilot vetted in the voice interview before they unlock bookings), so scaffolding 12 MSW handlers from an OpenAPI spec is a one-hour task, not a one-day task. Across our 12,800-engineer pool the median time to first commit on a test-infra spec is under 24 hours.
For a smaller scope (writing handlers for one or two endpoints, adding fixtures for an existing setup), the Mid tier ($1,000/week) is the right call. Cleanup of stale fixtures, dependency hygiene around test libraries, and doc-writing for the rerecord process fits cleanly in the Junior tier ($500/week).
If you want a senior to own test infrastructure for a sprint, book a senior engineer on Cadence with a 48-hour free trial. Replace any week. No notice period.
msw to your dev dependencies, run npx msw init public/ for browser support, and set up a server file at mocks/server.ts for Node.Stripe-Version: 2024-06-20) so contract drift is visible at the file level.beforeAll and reset handlers between tests so state doesn't bleed.Try this: book a Cadence engineer for one week. Use the 48-hour free trial to scope the mock-infrastructure work, get an MSW handler set scaffolded for your top 5 vendors, and a working nightly contract check before the trial ends. If it doesn't ship, you pay nothing.
No. Use a real database running in a Docker container or a transactional test harness that rolls back after each test. Database mocks lie about constraints, indexes, query planner behavior, and SQL dialect quirks more often than they save you time. Postgres in a container starts in under 3 seconds and gives you 100% behavioral fidelity.
For most teams, yes. MSW v2 works in both the browser and Node with the same handler files, so frontend tests, backend integration tests, and Storybook share one source of truth. nock is still the right call for pure Node libraries built directly on the http module, and for cases where you need very low-level request matching.
Pin every fixture to a known upstream API version, and run one nightly job that hits the real sandbox and diffs the response schema against your fixtures. When the diff fails, regenerate the fixture before the next merge. This is the single highest-leverage discipline against silent contract drift.
Always run at least one PR-gating test against Stripe test mode for any payment flow. Mock the happy path so CI stays fast, but let the real sandbox catch contract changes and parameter typos. The same rule applies to Auth0 (dev tenant), AWS SES (sandbox mode), and any third party where money or auth is on the line.
The most senior backend engineer on the team. It touches CI, fixtures, error envelopes, retry logic, and contract drift, so it's the wrong place to send someone who's still learning the stack. On Cadence, the Senior tier ($1,500/week) typically owns this work end to end and hands a documented playbook back to the team.