
To set up E2E testing for a SaaS in 2026, pick Playwright, write tests for five critical flows (signup, onboarding, billing, core happy path, account deletion), seed a fresh tenant per CI run, save auth state once with storageState, shard across 4-8 workers, and retry flaky tests once before quarantining them. Five reliable tests gating every PR catch about 80% of the bugs that used to escape to production.
The rest of this post is the playbook. Real code, real tradeoffs, and an honest section on when to skip E2E entirely.
Generic E2E advice (record a happy path, run it on every commit) was written for content sites and marketing pages. A SaaS has three things those products do not.
First, multi-tenancy. Tests cannot pollute each other's data, and "the same email signing up twice" is a real bug class you have to test. Second, billing. A broken upgrade flow is invisible until a customer hits it, and by then you have already paid the CAC. Third, async webhooks. Stripe sends checkout.session.completed 200ms to 4 seconds after the redirect. Your test has to wait for it or it asserts against an empty subscription row.
For a SaaS spending $5k/month on ads at a 3% trial-start rate, every broken signup hour costs about $200 in wasted spend. E2E exists to make sure that hour never happens.
In 2026 there are three real choices, and one is the default.
| Tool | Best for | Cost | Cross-browser | Verdict |
|---|---|---|---|---|
| Playwright | New SaaS in 2026 | Free | Chromium + Firefox + WebKit | Default |
| Cypress | Existing Cypress suites | Free; Cypress Cloud $75-300/mo | Chromium + Firefox + WebKit (slower) | Migrate gradually |
| Puppeteer | Headless scraping | Free | Chromium only | Wrong tool for user-flow E2E |
Playwright is the answer for any new SaaS this year. The State of JS 2024 survey showed it overtaking Cypress in both satisfaction and usage growth. The reasons: free parallel execution via --shard, three real browser engines bundled in, multi-language bindings, and an auto-waiting expect API that eliminates most timing-related flake.
Cypress is still good software. If you already have a 200-test Cypress suite, do not throw it away. Migrate new flows to Playwright and let the Cypress suite shrink by attrition. The one place Cypress still wins is the interactive test runner, a better debugging experience than Playwright's UI mode, though the gap is closing.
Puppeteer is for headless Chromium scraping and PDF generation. Not a user-flow E2E framework. Skip.
For deeper coverage of the Playwright API itself, see our Playwright E2E testing deep dive. This post stays at the SaaS-application level.
You do not need 200 tests on day one. You need five, and they all have to actually work. Twenty reliable tests beat 200 flaky ones every time.
4242 4242 4242 4242, complete checkout, wait for the webhook, assert the subscription row flipped to active and the feature gate unlocked.That is the floor. Everything else (admin flows, multi-user collaboration, edge cases on permissions) gets added one test per sprint as the product surfaces real bugs.
How you give a test its starting state is the second-biggest decision after tool choice. Three patterns, three real tradeoffs.
Fresh tenant per CI run is the default and what you should reach for first. Before each test (or each worker), hit an API endpoint or a seed script that provisions a clean tenant: a new org, a new admin user, a known set of feature flags. After the test, soft-delete it. This costs 2-5 seconds of setup per test but gives you full isolation, which is what makes parallelism safe. Pair this pattern with a robust multi-tenancy schema so that creating and tearing down tenants is a database operation, not a re-deploy.
Shared staging environment is what most teams accidentally end up with. It works for the first three tests and breaks the moment you parallelize. Two workers signing up "test+ci@example.com" simultaneously hit your unique-email constraint and one fails for reasons unrelated to the actual code under test. Avoid.
Production mock with msw + Stripe test mode runs the entire stack in-process: real React app, mocked API responses, real Stripe test webhooks. Fastest to run (no network), hardest to keep in sync with the real backend. Use this for the happy-path PR gate and run a smaller fresh-tenant suite on merge.
For most teams: fresh tenant for the regression suite, msw + Stripe test mode for the PR smoke suite. That gives you sub-5-minute PR gates and full-fidelity regression on main.
Logging in through the UI in every test is the most common cause of slow E2E suites. Do it once and replay the cookie.
Playwright's storageState pattern: a global setup file logs in via the API (or once via the UI), saves the cookies and localStorage to a JSON file, and every other test loads that file as its starting context.
// global-setup.ts
import { chromium, FullConfig } from '@playwright/test';
export default async function globalSetup(config: FullConfig) {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(`${process.env.BASE_URL}/login`);
await page.fill('[data-test="email"]', process.env.TEST_USER_EMAIL!);
await page.fill('[data-test="password"]', process.env.TEST_USER_PASSWORD!);
await page.click('[data-test="login-submit"]');
await page.waitForURL('**/dashboard');
await page.context().storageState({ path: 'auth/admin.json' });
await browser.close();
}
Then in playwright.config.ts:
export default defineConfig({
globalSetup: require.resolve('./global-setup'),
use: { storageState: 'auth/admin.json' },
fullyParallel: true,
retries: process.env.CI ? 1 : 0,
workers: process.env.CI ? 4 : undefined,
reporter: [['html'], ['github']],
});
For tests that mutate user state (changing the password, deleting the account), use per-worker accounts: assign worker 0 to test+w0@example.com, worker 1 to test+w1@example.com, and so on. Playwright exposes testInfo.workerIndex for exactly this. Test the actual UI login flow exactly once, in a dedicated test file that does not load storageState.
If your auth provider supports it, programmatic login via API is faster and more stable than UI login. We covered the broader picture in implementing authentication in 2026; if you are on Clerk, Supabase Auth, or Auth.js, all three expose a server-side helper for minting a session token directly.
A 200-test suite running serially is a 25-minute CI job. The same suite sharded across 4 workers runs in 7 minutes. Across 8, in under 4. This is the single largest CI speedup available and Playwright supports it natively.
# .github/workflows/e2e.yml
jobs:
e2e:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1/4, 2/4, 3/4, 4/4]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npm ci
- run: npx playwright install --with-deps chromium
- run: npx playwright test --shard=${{ matrix.shard }}
env:
BASE_URL: ${{ secrets.STAGING_URL }}
TEST_USER_EMAIL: ${{ secrets.TEST_USER_EMAIL }}
TEST_USER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}
- if: failure()
uses: actions/upload-artifact@v4
with:
name: playwright-report-${{ matrix.shard }}
path: playwright-report/
retention-days: 7
Four shards on the GitHub Actions free tier (2,000 minutes/month) covers about 60 PR runs per day, which is more than most teams ship. If you outgrow that, Currents.dev orchestrates Playwright (and Cypress) test runs across machines with a unified dashboard, costing roughly $75-200/month depending on parallelism. It also stitches the per-shard HTML reports into one view, which is worth the price the first time you debug a flake at 2 AM.
For multi-browser coverage, add browser: [chromium, firefox, webkit] to the matrix. In practice, run all three on main and Chromium-only on PR. Cross-browser bugs are real but rare, and tripling your PR runtime to catch one a quarter is a bad trade.
Every E2E suite gets flaky. The question is whether you have a process for it.
The rule we use: retries: 1 in CI. A test that fails twice in a row is genuinely broken; a test that fails once and passes on retry is flaky. The first retry catches transient network blips and Stripe webhook lag. Anything beyond two retries hides real bugs.
When a test flakes, mark it test.fixme() and open a ticket. The test still runs (so we see the failure pattern in the report), but the failure does not block merge. The ticket has a one-week deadline: fix the flake or delete the test. No third option. Tests that limp along for three months erode trust in the entire suite, and a suite people do not trust gets ignored.
The three root causes of flake, in order:
page.waitForTimeout(2000) with expect(page.locator('...')).toBeVisible(). Auto-waiting is the single best feature Playwright shipped.waitForWebhook helper that polls the database for the expected state with a 30-second timeout.For test selection on the application side, use data-test attributes, not CSS classes. Classes change for styling reasons. data-test attributes only change when someone deliberately edits the test contract.
The matrix file above is the production version. Two more details worth pinning down.
Video on failure, screenshots, traces. In playwright.config.ts:
use: {
trace: 'on-first-retry',
screenshot: 'only-on-failure',
video: 'retain-on-failure',
}
Trace files are gold. Open one in npx playwright show-trace and you get a frame-by-frame replay of the failed test with network logs, console output, and the DOM snapshot at every action. Most flake debugs that used to take an hour now take five minutes.
Report as artifact. Upload playwright-report/ on failure (the matrix YAML above does this). Reviewers click through to a static HTML report from the PR check, which is faster than rerunning locally to reproduce.
Caching the browser binaries. Playwright bundles Chromium, Firefox, and WebKit, which is about 300MB. Cache them between runs:
- uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ hashFiles('package-lock.json') }}
This shaves 60-90 seconds off every CI run.
If you are still figuring out the bigger CI/CD picture, our CI/CD pipeline for startups post covers PR gates, deploy previews, and the right place E2E sits in the chain.
Solo project: Playwright's built-in HTML reporter is enough. Run npx playwright show-report and you get a searchable per-test view with traces, screenshots, and timing.
Team of 3+: install Currents.dev. The killer feature is parallel-run aggregation, which the standalone HTML reporter does not handle well across sharded jobs. You also get historical pass/fail rates per test, which makes flake quarantine objective ("this test failed 11 of the last 100 runs, into the freezer it goes") instead of vibes-based.
Slack integration: post on failure only. Teams that post on every pass quickly learn to ignore the channel. We use a GitHub Actions step that pings #engineering only when the matrix job fails on main.
E2E has an ROI curve. It bends sharply down for these cases:
If your unit and integration tests already gate signup, payment, and the core action, you are 80% of the way to E2E's protection. We covered the broader testing-tool tradeoff in Jest vs Vitest 2026, and if you want a single number for how much your test suite actually catches, see code coverage in 2026. A solid unit suite beats an absent E2E suite every time.
E2E rollouts are a typical Senior tier ($1,500/week) project on Cadence. The work is well-scoped, has a clear definition of done (5 flows green in CI, sub-10-minute runtime, retry+quarantine workflow documented), and benefits from an engineer who has done it before and will not relitigate Playwright vs Cypress for the third time.
Every engineer on Cadence is AI-native by baseline, vetted on Cursor and Claude Code fluency in the voice interview before they unlock bookings. Out of our 12,800-engineer pool, the median time to first commit on a new booking is 27 hours. For a contained piece of work like an E2E pipeline, that means tests in CI by the end of week one.
If you want to know whether your current testing setup is worth keeping, audit your stack with Ship or Skip. It will give you an honest grade on what to keep, what to replace, and what to delete.
npm init playwright@latest.data-test attributes, not CSS classes, for selectors.retries: 1 in CI. When a test flakes, mark it test.fixme() and open a one-week ticket: fix or delete.Most SaaS testing rollouts stall not on tool choice but on the second engineer to touch them. If your team is one founder and one contractor, book a senior engineer on Cadence for a week. The 48-hour free trial covers writing the first three flows; if it is not in CI by Friday, you do not pay.
A senior engineer can ship the first 5 working tests against a fresh tenant in about 3-5 days. Hardening for CI parallelism, flake quarantine, and per-worker auth state adds another week. Most teams reach a stable, trusted suite in two sprints.
Five. One per critical flow: signup, onboarding, billing upgrade, core feature happy path, and account deletion. Twenty reliable tests beat 200 flaky ones every time. Add one test per sprint as new features ship, not as a separate testing initiative.
Playwright if you are starting fresh. Cypress if you have an existing Cypress suite worth keeping and migration cost is real. Playwright leads the State of JS 2024 satisfaction and usage scores, has Microsoft backing, ships free parallel sharding, and bundles three browser engines without paid add-ons.
Use Stripe test mode with the card number 4242 4242 4242 4242. Click your real upgrade button, complete the Stripe-hosted checkout, then poll your database (or a webhook-receipt log) for the checkout.session.completed event. Assert that the subscription row flipped to active and the feature gate unlocked in the UI.
Three rules. Never use waitForTimeout, always use Playwright's auto-waiting expect (toBeVisible, toHaveText). Isolate test data per worker via per-worker accounts or fresh tenants. Allow one retry in CI; if a test fails twice, quarantine it with test.fixme() and fix or delete within a sprint. Tests that limp along for months erode trust in the whole suite.