I am a...
Learn more
How it worksPricingFAQ
Account
May 14, 2026 · 12 min read · Cadence Editorial

How to run integration tests in CI

integration tests ci — How to run integration tests in CI
Photo by [Wolfgang Weiser](https://www.pexels.com/@wolfgang-weiser-467045605) on [Pexels](https://www.pexels.com/photo/view-of-pipelines-in-a-forest-18784617/)

How to run integration tests in CI

To run integration tests in CI, spin up your real dependencies (Postgres, Redis, the queue) inside service containers or testcontainers, seed test data with factories, run the suite in parallel against isolated schemas, then tear down. In 2026 the default stack for a Node or TypeScript app is GitHub Actions plus testcontainers-node plus Vitest sharding, with a path filter that skips the job entirely on docs-only PRs.

That sentence is the whole answer. The rest of this post is the stuff the top SERP results skip: which of the three CI patterns actually wins, how to isolate the database without rewriting your test runner, how to handle flakes without quarantining your team's morale, and what the real CI bill looks like.

Integration tests vs unit tests vs E2E: where the line sits

Three test types, three different reasons to run them. The line confuses new teams and the confusion shows up as either a 40-minute CI run or a green build that ships a broken checkout flow.

TypeWhat it testsReal deps?Speed per test
UnitPure functions, isolated logicNo (mocked)<50ms
IntegrationModules talking to real db / cache / queueYes, no browser50ms-2s
E2EFull user journeys via browserYes, plus browser5-30s

Unit tests catch logic bugs in milliseconds. Integration tests catch the bugs that only appear when your code talks to Postgres for real: the missing index, the wrong transaction isolation, the JSONB query that worked in a mock and explodes in production. End-to-end tests, usually written in Playwright, cover the seams between frontend and backend.

A healthy 2026 split for a SaaS startup looks like 70% unit, 25% integration, 5% E2E. If integration tests are missing entirely, you have a green CI and a Friday rollback. If integration is your only layer, your CI takes 25 minutes and engineers stop running it locally.

The three CI patterns and which one wins in 2026

There are exactly three serious patterns for booting real services in CI. Most teams reach for the wrong one because the docs nudge them toward it.

ApproachSetupDev parityMulti-serviceIsolation
Docker Composedocker-compose.yml + wait scriptHighGreatShared db
GitHub Actions servicesYAML services: blockLowAwkwardShared db
testcontainers-nodeimport { PostgreSqlContainer } ...HighGreatPer-test container

Docker Compose is what most engineers reach for first because they already use it locally. It works, but on CI you pay for the cold pull every run, and tests share one database, so parallelism corrupts state.

GitHub Actions service containers are the official answer in the docs. Define a postgres service, set the env vars, add a health check. Fine for a single Postgres instance. Painful when you need Postgres + Redis + a fake S3 + a Kafka broker, because each service definition is a YAML wall and the wait-for-ready dance is fragile.

testcontainers-node is the modern winner. You import PostgreSqlContainer, call .start() in a setup hook, and the library boots a real Postgres in Docker, returns the connection string, and shuts it down on teardown. It works identically on your laptop and in GitHub Actions because Docker is available in both. You get full multi-service support, per-test or per-suite isolation, and zero YAML acrobatics.

The trade-off: testcontainers adds 1-3 seconds of container start-up per suite. With container reuse enabled, that drops to ~200ms warm. The dev-CI parity it buys is worth the cost for any team past the toy-project stage.

Picking and isolating your test database

Postgres in a container is the standard. SQLite-in-memory is a tempting shortcut and a guaranteed footgun: half your production queries (window functions, JSONB operators, RLS policies) behave differently on SQLite, and you only find out in staging.

Once you have Postgres, the question is isolation. Three patterns:

  1. Shared database, run tests serially. Slowest, but simple. Fine for a 50-test suite.
  2. Transactional rollback per test. Begin a transaction in the beforeEach, roll back in afterEach. Fast (~5ms overhead). Breaks down if your code under test commits its own transactions or fires triggers that depend on commit.
  3. Ephemeral schema per test (or per worker). Each test (or each Vitest/Jest worker) gets a fresh CREATE SCHEMA test_${id}, runs migrations into it, runs the tests, drops it on teardown. ~50ms setup cost, true isolation, parallelism-safe.

For most apps, schema-per-worker (not per-test) is the sweet spot. You pay the schema-create cost once per worker, run hundreds of tests inside it with transactional rollback for cheap cleanup between tests, and parallelism scales linearly with worker count.

If you use Drizzle for migrations, the schema-per-worker pattern is six lines: read process.env.VITEST_WORKER_ID, suffix the schema name, run migrate() against it, set the search path, done.

Seeding test data without making your suite a nightmare

Three ways to get data into the test database, and only one of them scales.

JSON fixtures (a fixtures/users.json file you load before each test) are quick to start with and impossible to maintain. Add a non-null column and every fixture file breaks. Avoid past the prototype phase.

Factories with @faker-js/faker are the right default. Define a userFactory({ overrides }) that returns a fully-valid User with random sensible defaults, override only the fields the test cares about. The pattern came from Ruby's FactoryBot a decade ago and it works because it pushes the schema-knowledge into one place.

import { faker } from '@faker-js/faker';

export const userFactory = (overrides: Partial<User> = {}) => ({
  id: faker.string.uuid(),
  email: faker.internet.email(),
  name: faker.person.fullName(),
  createdAt: new Date(),
  ...overrides,
});

Snapshot data (a captured production-like dump, scrubbed of PII) is useful for the 5% of tests that need realistic shape, like a query plan test or a CSV-export regression. Don't use it as the default; the load time alone tanks your suite.

A common mistake: using factories that hit the database directly inside the factory. Keep factories pure (returning plain objects) and let the test decide whether to insert. This makes them composable and stops one slow factory from poisoning a thousand tests.

Parallelism: how to actually use it

The single biggest win on CI wall time is correct parallelism. The single most common mistake is enabling it without isolating state, then watching tests fail randomly.

Vitest 3.x shards by default. Run vitest --shard=1/4 across 4 GitHub Actions matrix jobs and you cut wall time roughly 60-75%. Jest equivalents: --maxWorkers=50% for in-process workers, plus --shard=1/4 across CI matrix entries for cross-runner sharding.

GitHub-hosted Linux runners ship with 4 cores, so --maxWorkers=4 (or Vitest's default os.cpus().length) is the right baseline. Larger runners are available; for most teams, scaling matrix shards horizontally is cheaper than scaling runner size.

Two rules to make parallelism actually safe:

  • One database namespace per worker. Schema-per-worker, as above, or a separate testcontainers container per worker. Don't share.
  • No globals between tests in the same worker. Reset module mocks, clear in-memory caches, never write to a shared file.

If you've done this right, you can prove it: run the suite with --shard=1/4 four times, in random order, and watch all four shards stay green. Any test that depends on order is a flake waiting to fire.

Flaky tests: retry once, quarantine repeat offenders

Every team gets flakes. Google's published baseline is ~1.6% of tests flaking on any given run. The question is whether you have a policy or a vibe.

The policy that works:

  1. Retry once at the framework level. Vitest: retry: 1. Jest: jest.retryTimes(1). This eats the genuinely random failures (network blip, container slow to warm).
  2. Track failures. A retried-and-passed test logs a warning. Three warnings in a week and the test moves to a quarantined.txt file.
  3. Quarantined tests run on a separate non-blocking job. They still run; they just don't block merges. This stops flakes from holding the team hostage while the owner investigates.
  4. Block merges if the quarantine list grows past N. Pick a number (10 is reasonable for a 500-test suite). When you hit it, fixing flakes becomes the whole team's problem until you're back under the cap.

The wrong policy: retry three times silently. That just hides the flake until it's a 30% failure rate and engineers ignore the red CI.

CI cost: the part nobody writes about

This is the gap in every other article on the SERP. They show you a ci.yml and never tell you what the bill looks like.

GitHub Actions on Linux runners is currently $0.008/minute past the free tier. A startup with 8 engineers, each pushing 5 PRs/day, each running a 6-minute integration suite, burns roughly 14,400 billable minutes per month. That's ~$115/month at list price, more if you're on macOS runners ($0.08/minute, 10x).

Three optimizations cut the bill 40-60% with zero quality trade-off:

  • Cache node_modules via actions/setup-node with cache: 'npm'. Saves 30-90 seconds per run.
  • Cache Docker layers via docker/setup-buildx-action + cache-from: type=gha. Container rebuilds drop from 2 minutes to 15 seconds.
  • Skip the job entirely on path-only PRs. Use dorny/paths-filter or actions/changed-files to detect docs-only / README / static-image changes and skip the test matrix. On a typical product team, 30-40% of PRs touch nothing testable.

Layer testcontainers reuse on top (TESTCONTAINERS_REUSE_ENABLE=true in your CI env) and you pay the container start cost once per shard, not once per suite.

When the suite still creeps past 10 minutes, that's the moment to bring in someone who's done this rodeo before. A senior engineer on Cadence ($1,500/week) typically cuts a bloated CI pipeline by 50-70% in a single week, because the patterns are the same across stacks: cache, shard, skip, quarantine. The 27-hour median time to first commit on the platform means they're shipping the fix in the same sprint, not the next quarter.

A real working ci.yml for Postgres + Redis

Here's a working .github/workflows/ci.yml for a Node + Postgres + Redis stack using testcontainers-node, Vitest sharding, and the path-filter skip pattern.

name: CI

on:
  pull_request:
  push:
    branches: [main]

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      code: ${{ steps.filter.outputs.code }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            code:
              - 'src/**'
              - 'tests/**'
              - 'package.json'
              - 'pnpm-lock.yaml'
              - '.github/workflows/ci.yml'

  test:
    needs: changes
    if: needs.changes.outputs.code == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with:
          version: 9
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: 'pnpm'
      - uses: docker/setup-buildx-action@v3
      - run: pnpm install --frozen-lockfile
      - name: Run integration tests
        env:
          TESTCONTAINERS_REUSE_ENABLE: 'true'
        run: pnpm vitest run --shard=${{ matrix.shard }}/4

The changes job runs in <10 seconds and gates the expensive test matrix. The matrix runs four shards in parallel, each with its own testcontainers-managed Postgres and Redis. Cache hits keep cold-start under 30 seconds. If you're running a Next.js app, the same workflow shape composes cleanly with the GitHub Actions setup for Next.js builds and previews.

Steps

  1. Stand up the service containers. In your test setup file, start a PostgreSqlContainer and a RedisContainer from testcontainers-node. Export the connection strings as env vars so your app code reads them like production.
  2. Seed the test data. Run your migrations against the fresh database (pnpm db:migrate), then call your factory functions inside beforeEach to insert only the records the test needs.
  3. Run the tests. Invoke vitest run --shard=${SHARD}/${TOTAL} (or jest --shard) so each CI matrix job owns a slice of the suite. Each shard talks to its own container, so there is no cross-shard interference.
  4. Tear down cleanly. Testcontainers handles container shutdown automatically when the Node process exits. Add an afterAll hook to drop the test schema and close pool connections so the runner exits cleanly and the next run starts from zero.
  5. Cache the build sequence. Configure actions/setup-node with cache: 'pnpm' (or 'npm'), add docker/setup-buildx-action with cache-from: type=gha, and gate the test job behind a dorny/paths-filter check. These three cache layers turn a 6-minute cold run into a 90-second warm run.

Common pitfalls

  • Sharing one Postgres across all parallel workers. Tests pass locally, fail in random order on CI. Always isolate at the worker level.
  • Mocking the database in integration tests. If you're mocking Postgres, you're writing a unit test. Call it that, run it in the unit suite, and add a real integration test alongside.
  • Letting the suite balloon past 10 minutes. Engineers stop running it locally. PR cycle time doubles. Either shard harder or move slow tests into a nightly job.
  • Snapshotting fixtures from production without scrubbing. GDPR violation waiting to fire. If you need realistic data, build a scrubber that handles deletion-style PII removal and check in the scrubbed snapshot, not the raw dump.
  • Ignoring quarantined tests forever. A quarantine queue is a holding pen, not a graveyard. Set a cap and enforce it.

When you can skip this entirely

Two founders, no users, three days from first invoice. You don't need integration tests. You need a deployed app and a customer. Add the suite the week after your first paying customer cancels because of a regression. The ROI curve on integration tests starts at roughly the same point as the ROI curve on having a proper CI/CD pipeline: the moment two people are committing to the same repo daily.

Past that point, integration tests are the cheapest insurance you'll buy. A single one-hour outage from a missed regression costs more than three months of CI minutes plus the engineer-week to set this up.

If you'd rather not build the harness yourself, the cheapest version of "ship this in week one" is to book a senior engineer for a single week, hand them this post, and let them adapt the ci.yml to your stack. That's how most of the playbooks on this blog get implemented inside Cadence customer codebases.

Want an honest read on whether your CI pipeline is helping or hurting? Audit your stack with Ship or Skip and get a graded report in 5 minutes, including which testing layer to fix first.

FAQ

How long should integration tests take in CI?

Aim for under 5 minutes wall time for a 200-test suite using 4-shard parallelism. Anything over 10 minutes destroys your feedback loop and engineers stop running tests locally, which means broken code lands in main between green-build notifications.

Should I use Docker Compose or testcontainers in 2026?

Testcontainers for tests, Docker Compose for the local dev environment. They solve different problems. Docker Compose gives you a long-lived stack you bring up once a day; testcontainers gives you ephemeral, isolated, per-suite containers that match production. Pairing them keeps dev-CI parity high without slowing CI.

What's the cheapest way to run integration tests in GitHub Actions?

Three changes: cache node_modules via actions/setup-node, cache Docker layers via docker/setup-buildx-action with cache-from: type=gha, and gate the test job behind a dorny/paths-filter check that skips entirely on docs-only PRs. Together they cut billed minutes 40-60% with no quality trade-off.

How do I handle a flaky integration test?

Retry once at the framework level (retry: 1 in Vitest, jest.retryTimes(1) in Jest). If a test fails twice in a row across three runs, move it to a quarantined-tests file that runs on a non-blocking job and open a P1 ticket on the owner. Cap the quarantine list at 10 tests; when you hit the cap, fixing flakes becomes the whole team's problem.

Do I need testcontainers if I'm only testing one Postgres database?

Probably not. A single GitHub Actions postgres service is fine for a one-database app. The break point is multi-service stacks (Postgres + Redis + an S3 mock + a queue) or any need for true per-test isolation. Once you cross either line, testcontainers pays back its setup cost in the first week.

All posts