
To run integration tests in CI, spin up your real dependencies (Postgres, Redis, the queue) inside service containers or testcontainers, seed test data with factories, run the suite in parallel against isolated schemas, then tear down. In 2026 the default stack for a Node or TypeScript app is GitHub Actions plus testcontainers-node plus Vitest sharding, with a path filter that skips the job entirely on docs-only PRs.
That sentence is the whole answer. The rest of this post is the stuff the top SERP results skip: which of the three CI patterns actually wins, how to isolate the database without rewriting your test runner, how to handle flakes without quarantining your team's morale, and what the real CI bill looks like.
Three test types, three different reasons to run them. The line confuses new teams and the confusion shows up as either a 40-minute CI run or a green build that ships a broken checkout flow.
| Type | What it tests | Real deps? | Speed per test |
|---|---|---|---|
| Unit | Pure functions, isolated logic | No (mocked) | <50ms |
| Integration | Modules talking to real db / cache / queue | Yes, no browser | 50ms-2s |
| E2E | Full user journeys via browser | Yes, plus browser | 5-30s |
Unit tests catch logic bugs in milliseconds. Integration tests catch the bugs that only appear when your code talks to Postgres for real: the missing index, the wrong transaction isolation, the JSONB query that worked in a mock and explodes in production. End-to-end tests, usually written in Playwright, cover the seams between frontend and backend.
A healthy 2026 split for a SaaS startup looks like 70% unit, 25% integration, 5% E2E. If integration tests are missing entirely, you have a green CI and a Friday rollback. If integration is your only layer, your CI takes 25 minutes and engineers stop running it locally.
There are exactly three serious patterns for booting real services in CI. Most teams reach for the wrong one because the docs nudge them toward it.
| Approach | Setup | Dev parity | Multi-service | Isolation |
|---|---|---|---|---|
| Docker Compose | docker-compose.yml + wait script | High | Great | Shared db |
| GitHub Actions services | YAML services: block | Low | Awkward | Shared db |
| testcontainers-node | import { PostgreSqlContainer } ... | High | Great | Per-test container |
Docker Compose is what most engineers reach for first because they already use it locally. It works, but on CI you pay for the cold pull every run, and tests share one database, so parallelism corrupts state.
GitHub Actions service containers are the official answer in the docs. Define a postgres service, set the env vars, add a health check. Fine for a single Postgres instance. Painful when you need Postgres + Redis + a fake S3 + a Kafka broker, because each service definition is a YAML wall and the wait-for-ready dance is fragile.
testcontainers-node is the modern winner. You import PostgreSqlContainer, call .start() in a setup hook, and the library boots a real Postgres in Docker, returns the connection string, and shuts it down on teardown. It works identically on your laptop and in GitHub Actions because Docker is available in both. You get full multi-service support, per-test or per-suite isolation, and zero YAML acrobatics.
The trade-off: testcontainers adds 1-3 seconds of container start-up per suite. With container reuse enabled, that drops to ~200ms warm. The dev-CI parity it buys is worth the cost for any team past the toy-project stage.
Postgres in a container is the standard. SQLite-in-memory is a tempting shortcut and a guaranteed footgun: half your production queries (window functions, JSONB operators, RLS policies) behave differently on SQLite, and you only find out in staging.
Once you have Postgres, the question is isolation. Three patterns:
beforeEach, roll back in afterEach. Fast (~5ms overhead). Breaks down if your code under test commits its own transactions or fires triggers that depend on commit.CREATE SCHEMA test_${id}, runs migrations into it, runs the tests, drops it on teardown. ~50ms setup cost, true isolation, parallelism-safe.For most apps, schema-per-worker (not per-test) is the sweet spot. You pay the schema-create cost once per worker, run hundreds of tests inside it with transactional rollback for cheap cleanup between tests, and parallelism scales linearly with worker count.
If you use Drizzle for migrations, the schema-per-worker pattern is six lines: read process.env.VITEST_WORKER_ID, suffix the schema name, run migrate() against it, set the search path, done.
Three ways to get data into the test database, and only one of them scales.
JSON fixtures (a fixtures/users.json file you load before each test) are quick to start with and impossible to maintain. Add a non-null column and every fixture file breaks. Avoid past the prototype phase.
Factories with @faker-js/faker are the right default. Define a userFactory({ overrides }) that returns a fully-valid User with random sensible defaults, override only the fields the test cares about. The pattern came from Ruby's FactoryBot a decade ago and it works because it pushes the schema-knowledge into one place.
import { faker } from '@faker-js/faker';
export const userFactory = (overrides: Partial<User> = {}) => ({
id: faker.string.uuid(),
email: faker.internet.email(),
name: faker.person.fullName(),
createdAt: new Date(),
...overrides,
});
Snapshot data (a captured production-like dump, scrubbed of PII) is useful for the 5% of tests that need realistic shape, like a query plan test or a CSV-export regression. Don't use it as the default; the load time alone tanks your suite.
A common mistake: using factories that hit the database directly inside the factory. Keep factories pure (returning plain objects) and let the test decide whether to insert. This makes them composable and stops one slow factory from poisoning a thousand tests.
The single biggest win on CI wall time is correct parallelism. The single most common mistake is enabling it without isolating state, then watching tests fail randomly.
Vitest 3.x shards by default. Run vitest --shard=1/4 across 4 GitHub Actions matrix jobs and you cut wall time roughly 60-75%. Jest equivalents: --maxWorkers=50% for in-process workers, plus --shard=1/4 across CI matrix entries for cross-runner sharding.
GitHub-hosted Linux runners ship with 4 cores, so --maxWorkers=4 (or Vitest's default os.cpus().length) is the right baseline. Larger runners are available; for most teams, scaling matrix shards horizontally is cheaper than scaling runner size.
Two rules to make parallelism actually safe:
If you've done this right, you can prove it: run the suite with --shard=1/4 four times, in random order, and watch all four shards stay green. Any test that depends on order is a flake waiting to fire.
Every team gets flakes. Google's published baseline is ~1.6% of tests flaking on any given run. The question is whether you have a policy or a vibe.
The policy that works:
retry: 1. Jest: jest.retryTimes(1). This eats the genuinely random failures (network blip, container slow to warm).quarantined.txt file.The wrong policy: retry three times silently. That just hides the flake until it's a 30% failure rate and engineers ignore the red CI.
This is the gap in every other article on the SERP. They show you a ci.yml and never tell you what the bill looks like.
GitHub Actions on Linux runners is currently $0.008/minute past the free tier. A startup with 8 engineers, each pushing 5 PRs/day, each running a 6-minute integration suite, burns roughly 14,400 billable minutes per month. That's ~$115/month at list price, more if you're on macOS runners ($0.08/minute, 10x).
Three optimizations cut the bill 40-60% with zero quality trade-off:
node_modules via actions/setup-node with cache: 'npm'. Saves 30-90 seconds per run.docker/setup-buildx-action + cache-from: type=gha. Container rebuilds drop from 2 minutes to 15 seconds.dorny/paths-filter or actions/changed-files to detect docs-only / README / static-image changes and skip the test matrix. On a typical product team, 30-40% of PRs touch nothing testable.Layer testcontainers reuse on top (TESTCONTAINERS_REUSE_ENABLE=true in your CI env) and you pay the container start cost once per shard, not once per suite.
When the suite still creeps past 10 minutes, that's the moment to bring in someone who's done this rodeo before. A senior engineer on Cadence ($1,500/week) typically cuts a bloated CI pipeline by 50-70% in a single week, because the patterns are the same across stacks: cache, shard, skip, quarantine. The 27-hour median time to first commit on the platform means they're shipping the fix in the same sprint, not the next quarter.
Here's a working .github/workflows/ci.yml for a Node + Postgres + Redis stack using testcontainers-node, Vitest sharding, and the path-filter skip pattern.
name: CI
on:
pull_request:
push:
branches: [main]
jobs:
changes:
runs-on: ubuntu-latest
outputs:
code: ${{ steps.filter.outputs.code }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
code:
- 'src/**'
- 'tests/**'
- 'package.json'
- 'pnpm-lock.yaml'
- '.github/workflows/ci.yml'
test:
needs: changes
if: needs.changes.outputs.code == 'true'
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v4
with:
version: 9
- uses: actions/setup-node@v4
with:
node-version: 22
cache: 'pnpm'
- uses: docker/setup-buildx-action@v3
- run: pnpm install --frozen-lockfile
- name: Run integration tests
env:
TESTCONTAINERS_REUSE_ENABLE: 'true'
run: pnpm vitest run --shard=${{ matrix.shard }}/4
The changes job runs in <10 seconds and gates the expensive test matrix. The matrix runs four shards in parallel, each with its own testcontainers-managed Postgres and Redis. Cache hits keep cold-start under 30 seconds. If you're running a Next.js app, the same workflow shape composes cleanly with the GitHub Actions setup for Next.js builds and previews.
PostgreSqlContainer and a RedisContainer from testcontainers-node. Export the connection strings as env vars so your app code reads them like production.pnpm db:migrate), then call your factory functions inside beforeEach to insert only the records the test needs.vitest run --shard=${SHARD}/${TOTAL} (or jest --shard) so each CI matrix job owns a slice of the suite. Each shard talks to its own container, so there is no cross-shard interference.afterAll hook to drop the test schema and close pool connections so the runner exits cleanly and the next run starts from zero.actions/setup-node with cache: 'pnpm' (or 'npm'), add docker/setup-buildx-action with cache-from: type=gha, and gate the test job behind a dorny/paths-filter check. These three cache layers turn a 6-minute cold run into a 90-second warm run.Two founders, no users, three days from first invoice. You don't need integration tests. You need a deployed app and a customer. Add the suite the week after your first paying customer cancels because of a regression. The ROI curve on integration tests starts at roughly the same point as the ROI curve on having a proper CI/CD pipeline: the moment two people are committing to the same repo daily.
Past that point, integration tests are the cheapest insurance you'll buy. A single one-hour outage from a missed regression costs more than three months of CI minutes plus the engineer-week to set this up.
If you'd rather not build the harness yourself, the cheapest version of "ship this in week one" is to book a senior engineer for a single week, hand them this post, and let them adapt the ci.yml to your stack. That's how most of the playbooks on this blog get implemented inside Cadence customer codebases.
Want an honest read on whether your CI pipeline is helping or hurting? Audit your stack with Ship or Skip and get a graded report in 5 minutes, including which testing layer to fix first.
Aim for under 5 minutes wall time for a 200-test suite using 4-shard parallelism. Anything over 10 minutes destroys your feedback loop and engineers stop running tests locally, which means broken code lands in main between green-build notifications.
Testcontainers for tests, Docker Compose for the local dev environment. They solve different problems. Docker Compose gives you a long-lived stack you bring up once a day; testcontainers gives you ephemeral, isolated, per-suite containers that match production. Pairing them keeps dev-CI parity high without slowing CI.
Three changes: cache node_modules via actions/setup-node, cache Docker layers via docker/setup-buildx-action with cache-from: type=gha, and gate the test job behind a dorny/paths-filter check that skips entirely on docs-only PRs. Together they cut billed minutes 40-60% with no quality trade-off.
Retry once at the framework level (retry: 1 in Vitest, jest.retryTimes(1) in Jest). If a test fails twice in a row across three runs, move it to a quarantined-tests file that runs on a non-blocking job and open a P1 ticket on the owner. Cap the quarantine list at 10 tests; when you hit the cap, fixing flakes becomes the whole team's problem.
Probably not. A single GitHub Actions postgres service is fine for a one-database app. The break point is multi-service stacks (Postgres + Redis + an S3 mock + a queue) or any need for true per-test isolation. Once you cross either line, testcontainers pays back its setup cost in the first week.