How to mock external APIs in tests

To mock external APIs in tests, intercept HTTP at the boundary (not inside your business logic), use Mock Service Worker (MSW) as the default for both frontend and Node, and pin every fixture to a specific upstream API version. Then run one nightly integration test against the real sandbox so contract drift fails loud, not silent.

That is the whole playbook in three sentences. The rest of this post is why each of those choices wins in 2026, what to reach for when MSW isn't the right tool, and the five pitfalls that turn a green CI into a red production incident.

What it means to mock an external API (and what most teams get wrong)

A mock replaces a network call with a deterministic stand-in. When your code says "POST to api.stripe.com/v1/charges," the mock returns a canned response without leaving the test runner. Tests get faster, isolated, and reproducible.

The mistake we see most often: teams mock at the wrong layer. They stub their own StripeClient class and assert that client.createCharge was called with the right arguments. That tests the wrapper, not the integration. The mock never exercises serialization, retries, auth headers, or error parsing, so the first real call in production is also the first time those code paths run.

The fix is to intercept closer to the wire. MSW intercepts at the fetch/XHR layer. nock intercepts at Node's http module. Both run your actual HTTP client end to end, then catch the request before it leaves the process. That extra inch of realism catches an entire class of bugs the wrapper-stub approach hides.

Why this matters more in 2026

A typical SaaS app today depends on 6 to 10 external APIs out of the box: an auth provider, a payment processor, a search index, an LLM, an email vendor, an object store, an analytics pipeline, and usually two or three more. Every one of those is a network call that can flake, rate-limit, or rewrite its response shape.

AI-pair-programmed code ships faster than humans can write integration tests. A senior engineer in Cursor or Claude Code can land 4 to 6 PRs in a day. If each PR triggers a CI run that hits real third-party APIs, you're paying per token, per webhook, and per minute of wall clock. Mocks are the only way to keep CI under 5 minutes at that throughput.

Third-party APIs also version more aggressively now. Stripe rolls a new API version every 3 to 6 months. OpenAI deprecates models on rolling 6-month windows. If your mock fixtures were captured against last year's payload, your tests are lying to you with a smile.

The decision tree: mock, sandbox, record-replay, or contract test

Not every external call deserves the same treatment. Here's the four-way decision we use:

Unit + fast feedback loop: mock at the HTTP layer (MSW or nock).
Money or auth involved: mock for speed during dev, but run at least one PR-gating test against the real sandbox (Stripe test mode, Auth0 dev tenant, AWS SES sandbox).
Multi-team microservices in your own org: consumer-driven contract tests with Pact, so both sides of the integration are checked against the same agreement.
Real responses you can't realistically hand-author (think a 4KB GitHub webhook payload or a Mapbox geocoding response): record-and-replay with Polly.js or Ruby's VCR.

Most posts on this keyword pretend mocking is the only answer. It isn't. Mocking is the fast lane; the other three are guardrails that catch what the fast lane misses.

The 6-step playbook for mocking external APIs

1. Pick MSW as your default

Mock Service Worker v2 (GA late 2024) is the cleanest default for almost every team. The same handlers work in the browser (for component tests in Vitest or Jest) and in Node (for API-route and integration tests). One source of truth, two environments. You write a handler once and your frontend, backend, and Storybook all see the same mocked Stripe.

// mocks/handlers/stripe.ts
import { http, HttpResponse } from 'msw'

export const stripeHandlers = [
  http.post('https://api.stripe.com/v1/charges', async ({ request }) => {
    const body = await request.text()
    if (!body.includes('amount=')) {
      return HttpResponse.json({ error: { code: 'parameter_missing' } }, { status: 400 })
    }
    return HttpResponse.json({ id: 'ch_test_123', status: 'succeeded' })
  }),
]

2. Define handlers per third-party domain

Organize handlers by upstream domain, not by feature: mocks/handlers/stripe.ts, mocks/handlers/openai.ts, mocks/handlers/slack.ts. Each file owns the contract for one vendor. When Stripe ships a new API version, you change one file. When a feature uses three vendors, it composes three handler sets.

3. Pin every fixture to an upstream API version

Every recorded fixture or mock response should be tagged with the upstream version it came from. For Stripe, that's the Stripe-Version header (e.g., 2024-06-20). For OpenAI, it's the model and API path. Store the version in the filename or a sidecar JSON. This is the single biggest lever against silent contract drift, and it's what the Stripe webhook handler playbook hammers on too.

4. Mock error responses, not just the happy path

The 200 OK case is the easy one. The bugs hide in 429s, 401s, 500s, and the partial-success cases. Generate at least one handler per common error shape: rate limits, expired tokens, malformed payloads, server errors. Bonus points: include a handler that returns a 502 once, then a 200, so you exercise your retry logic.

5. Run one nightly contract check against the real sandbox

CI runs against mocks. A separate nightly job hits the real sandbox (Stripe test mode, OpenAI's free tier, Auth0 dev tenant) and diffs the response schema against your mock fixtures. When the diff fails, regenerate the fixture before the next merge. This is the same pattern that makes a good integration test setup in CI trustworthy week over week.

6. Add the AI-pair-programming step

This is the 2026 addition nobody else writes about. The marginal cost of writing a high-fidelity mock handler used to be 30 minutes per endpoint. With Cursor or Claude Code, you can:

Paste an OpenAPI spec or a captured cURL response.
Ask for an MSW handler that covers the happy path plus three error envelopes.
Get back a working handler in under a minute.

The economics flip. Mock coverage that wasn't worth the engineering hours last year is now a 60-second task.

Tool comparison: MSW, nock, Polly.js, Pact, WireMock

Tool	Best for	Layer	Trade-off
MSW	Default for frontend + Node	fetch / XHR	Slight setup cost; pays back because handlers are shared across test, dev, and Storybook
nock	Pure Node, low-level control	Node `http` module	Brittle when libraries use `fetch` instead of `http` (a growing problem in 2026)
Polly.js / VCR	Recording and replaying real responses	HTTP	Fixtures rot fast; you need explicit rerecord discipline
Pact	Multi-team microservices in your own org	Contract layer	Needs a Pact Broker and buy-in from the upstream team; not a 1-day install
WireMock	Polyglot teams; shared standalone mock server	HTTP server	Heaviest setup; shines as a shared dev environment, not as a per-test mock

If you have to pick one, pick MSW. If you're on a Java or Go-heavy stack and want a language-agnostic shared mock, WireMock is the better default.

Common pitfalls (and the production symptom you'll see)

These are the five mistakes we see most often, paired with what they look like at 3am.

Mocking your wrapper, not the network. Symptom: production 500s the first time a header is wrong, an auth token expires, or a content-type is off. Your tests never exercised the real HTTP path.
Drifting fixtures. Symptom: tests pass green, real Stripe webhooks return 400s in prod because the payload added a new required field three months ago.
Over-mocking your own internal services. Symptom: integration bugs at every seam. Internal services should be tested with the real service running in a container, not a mock. This is the same logic that drives a real E2E testing setup for a SaaS: mock the third party, run the rest.
No error-path coverage. Symptom: a 429 from OpenAI takes down your checkout flow because nobody mocked it. The retry-with-backoff code path has zero test coverage.
Treating the mock as the spec. Symptom: a bug ships, somebody updates the mock to match the buggy behavior so the test passes, and the bug is now permanent. The mock should mirror the upstream API, not your code.

If you want a no-BS audit of your own test infrastructure (mocks, fixtures, CI flake rate, contract coverage), our Ship or Skip tool grades it in about 90 seconds. It's free and tells you the truth.

When you can skip mocking entirely

Best practices have an ROI curve. Respect it. There are cases where mocking external APIs is overengineering:

Two-founder, pre-revenue startup, one external dependency. Ship the feature, watch the logs, fix what breaks. You'll have time for MSW after PMF.
Internal services where one team owns both sides. Prefer an in-process test double or a contract test. A network-layer mock is more machinery than the problem deserves.
Read-only public APIs with stable contracts (think a GitHub gist fetch or a public weather endpoint). A single cached recording, refreshed quarterly, is usually enough.

This is also a good frame for managing technical debt in a startup: every test infrastructure decision is a tradeoff against the speed of shipping. The right answer is rarely "mock everything."

Who should own the mock layer on your team

Mock infrastructure is cross-cutting: it touches CI, fixtures, error envelopes, retry logic, and contract drift. It's the wrong place to send a junior. It's also too important to leave to whoever last touched a test.

On Cadence, this kind of work usually lands with the Senior tier ($1,500/week). Senior engineers own scope and ship the playbook end to end: pick MSW, scaffold handlers per vendor, wire up the nightly contract check, document the rerecord process, and hand it back to the team. Every engineer on Cadence is AI-native by baseline (Cursor, Claude Code, and Copilot vetted in the voice interview before they unlock bookings), so scaffolding 12 MSW handlers from an OpenAPI spec is a one-hour task, not a one-day task. Across our 12,800-engineer pool the median time to first commit on a test-infra spec is under 24 hours.

For a smaller scope (writing handlers for one or two endpoints, adding fixtures for an existing setup), the Mid tier ($1,000/week) is the right call. Cleanup of stale fixtures, dependency hygiene around test libraries, and doc-writing for the rerecord process fits cleanly in the Junior tier ($500/week).

If you want a senior to own test infrastructure for a sprint, book a senior engineer on Cadence with a 48-hour free trial. Replace any week. No notice period.

Steps

Install MSW. Add msw to your dev dependencies, run npx msw init public/ for browser support, and set up a server file at mocks/server.ts for Node.
Create a handler per third-party domain. One file per vendor (Stripe, OpenAI, Slack). Cover the happy path first, then add 1-2 error envelopes per endpoint.
Pin fixtures to an upstream API version. Tag every fixture with the version header (e.g., Stripe-Version: 2024-06-20) so contract drift is visible at the file level.
Wire the mock server into your test runner. In Vitest or Jest, start the server in beforeAll and reset handlers between tests so state doesn't bleed.
Add a nightly contract check. Spin up a GitHub Actions cron job that hits the real sandbox, diffs the response schema against your fixtures, and posts to Slack on drift.
Document the rerecord workflow. A 10-line README so the next engineer (or AI agent) regenerates fixtures the same way every time.

Try this: book a Cadence engineer for one week. Use the 48-hour free trial to scope the mock-infrastructure work, get an MSW handler set scaffolded for your top 5 vendors, and a working nightly contract check before the trial ends. If it doesn't ship, you pay nothing.

FAQ

Should I mock my database in tests?

No. Use a real database running in a Docker container or a transactional test harness that rolls back after each test. Database mocks lie about constraints, indexes, query planner behavior, and SQL dialect quirks more often than they save you time. Postgres in a container starts in under 3 seconds and gives you 100% behavioral fidelity.

Is MSW better than nock in 2026?

For most teams, yes. MSW v2 works in both the browser and Node with the same handler files, so frontend tests, backend integration tests, and Storybook share one source of truth. nock is still the right call for pure Node libraries built directly on the http module, and for cases where you need very low-level request matching.

How do I keep mocks in sync with the real API?

Pin every fixture to a known upstream API version, and run one nightly job that hits the real sandbox and diffs the response schema against your fixtures. When the diff fails, regenerate the fixture before the next merge. This is the single highest-leverage discipline against silent contract drift.

When should I use a real Stripe sandbox instead of a mock?

Always run at least one PR-gating test against Stripe test mode for any payment flow. Mock the happy path so CI stays fast, but let the real sandbox catch contract changes and parameter typos. The same rule applies to Auth0 (dev tenant), AWS SES (sandbox mode), and any third party where money or auth is on the line.

Who owns mock infrastructure on a small team?

The most senior backend engineer on the team. It touches CI, fixtures, error envelopes, retry logic, and contract drift, so it's the wrong place to send someone who's still learning the stack. On Cadence, the Senior tier ($1,500/week) typically owns this work end to end and hands a documented playbook back to the team.

All posts