How to design webhooks for SaaS in 2026

Q: What's the minimum signing scheme to ship?

HMAC-SHA256 over {timestamp}.{raw_body} with a per-endpoint secret, sent in X-Webhook-Signature as t=...,v1=.... Support secret rotation with a 24-hour grace window. Don't invent your own scheme; the Stripe header format is well-understood by every framework and AI code generator.

Q: How do we test webhooks in development?

Ship a local-forwarding CLI (Stripe's pattern: stripe listen --forward-to localhost:3000/webhooks) or rely on tunneling tools like ngrok. Inside your dashboard, ship a "Send test event" button that fires a hand-crafted sample payload of any event type to any endpoint. Both unlock fast iteration for the consumer-side developer and shorten time-to-first-integration from days to hours.

To design webhooks for SaaS in 2026, start from the consumer's perspective: ship a stable event taxonomy with versioned payloads, sign every request with a rotatable secret, retry on an exponential curve with a hard cap, and give every customer a self-serve dashboard showing recent deliveries, failures, and a replay button. Everything else is detail.

Most webhook systems are built once, in a hurry, by a backend engineer who has never had to debug their own webhooks at 2 a.m. The result is the pattern you see across half the SaaS market: a POST /your-webhook endpoint, no signing, no retries, no dashboard, and a support inbox full of "did the event fire?" tickets. This guide is the producer-side playbook for getting it right the first time.

If you want the receiving side instead (verifying signatures, idempotency, queue handoff), we covered that separately in our Stripe webhook best practices guide. This post is about the side that emits the events.

Why webhook design matters more in 2026

Three shifts have raised the bar.

First, every modern SaaS now sits inside a workflow graph. Zapier, n8n, Make, and a generation of AI agent platforms consume your webhooks. A flaky webhook fan-out doesn't just annoy one customer; it breaks every automation downstream of them. The blast radius of a missed event is larger than it was three years ago.

Second, customers compare you to Stripe. Stripe's webhook UX (the event log, the resend button, the test mode, the typed payloads) has become the industry baseline. If your dashboard ships fewer features than Stripe's circa 2019, your prospects notice in the trial.

Third, AI-generated integrations break differently than human-written ones. When a customer's Claude or Cursor session writes a webhook handler, it expects predictable event names, stable payload shapes, and a published JSON schema it can paste into a prompt. Vague or undocumented payloads cost you integration velocity in a market where shipping a webhook handler should take 20 minutes, not two days.

The default approach (and why it breaks)

Most teams ship something like this:

async function onUserCreated(user) {
  for (const url of user.org.webhookUrls) {
    fetch(url, { method: 'POST', body: JSON.stringify(user) });
  }
}

It works in the small case. It breaks the moment any of these happen: the customer's endpoint times out, the customer adds a second webhook URL, you change a field name, a payload contains PII you didn't realize, or the customer asks "did event X actually fire on 2026-05-11?" and you have no log to point to.

The fix is not a smarter fetch call. It's a producer architecture: a separate webhook service with a typed event taxonomy, a delivery queue, a signing layer, a retry policy, and a dashboard. Each of these is a small decision; together they are the difference between a feature and a liability.

The producer-side playbook

1. Define an event taxonomy you can live with for two years

The event names you ship today are the API you will be supporting in 2028. Get the naming right before you ship a single event.

The convention we recommend, and that Stripe, Shopify, and GitHub all use in some form, is noun.verb. Examples:

invoice.paid
invoice.payment_failed
subscription.updated
customer.deleted

Past tense, always. Webhooks describe things that have happened, not things that are about to. user.creating is a bug; user.created is correct.

Group events under stable nouns even if the underlying tables change. If your users table becomes accounts next year, the event should still be user.created. Customers wrote handlers against the public name, not the schema.

Reserve a small set of top-level nouns at launch (5 to 10), and resist the urge to expand. Stripe ships roughly 250 event types across about 40 nouns. Shopify ships about 100 across 20. Slack ships about 80 across 15. Your first version should probably ship 15 to 30 events across 5 to 8 nouns.

2. Version payloads from day one

Every event carries a payload, and every payload will need to change. Bake the version into the envelope, not the URL.

{
  "id": "evt_01HXYZ...",
  "type": "invoice.paid",
  "api_version": "2026-05-01",
  "created": 1747526400,
  "data": {
    "object": { "id": "in_...", "amount_paid": 4200, "currency": "usd" }
  }
}

Pin every customer to the api_version that was current when they connected the webhook. When you ship a breaking change, customers stay on their pinned version until they explicitly upgrade. Stripe popularized this pattern; it's the single decision that lets you evolve payloads without breaking integrations.

The cost is real: you maintain a payload transformer per active version. The reward is also real: you can ship breaking changes any week without a customer-wide migration project.

3. Sign every request, and rotate the secret

Every webhook POST must carry a signature header so the consumer can verify the body came from you. The standard pattern:

X-Webhook-Signature: t=1747526400,v1=5257a869e7...

The signature is HMAC-SHA256 over {timestamp}.{raw_body} using a per-endpoint secret. The timestamp lets consumers reject replay attacks (drop anything older than 5 minutes).

Two implementation rules that are easy to get wrong:

Sign the raw body, not the parsed JSON. Re-serializing changes whitespace and key order and breaks the consumer's hash.
Support overlapping secrets during rotation. When the customer clicks "rotate," generate a new secret, accept either secret for 24 hours, then expire the old one. This is the pattern Shopify and Stripe both ship; it lets customers rotate without downtime.

A solid rotation API has three endpoints: POST /endpoints/{id}/rotate-secret, GET /endpoints/{id}/secrets (returns active and pending), and POST /endpoints/{id}/expire-old-secret (cuts the grace period short if the customer suspects compromise).

4. Retry on a curve, not on a loop

Naive retry policies fall into two failure modes: too few retries and you lose events on transient blips; too many and you DDoS the customer's endpoint when they have a real outage.

The curve we recommend, and that matches what most mature webhook systems ship:

Attempt	Delay from previous	Cumulative
1	0	immediate
2	1 min	1 min
3	5 min	6 min
4	30 min	36 min
5	2 hr	~2.5 hr
6	6 hr	~8.5 hr
7	12 hr	~20 hr
8	24 hr	~44 hr

After about 72 hours, give up and mark the event as permanently failed. Email the workspace owner. Surface the failure in the dashboard with a one-click replay.

Treat any 2xx as success. Treat 410 Gone as a signal to disable the endpoint automatically (the customer's app told you it no longer exists). Treat 5xx, timeouts, and connection errors as retriable. Treat 4xx other than 410 as a permanent failure on this attempt, no retry, because the consumer's app rejected the payload shape and replaying won't help.

Cap concurrent in-flight retries per endpoint at something low (we use 4). A flapping customer endpoint shouldn't be able to saturate your delivery workers.

5. Build the dashboard customers expect

The dashboard is the feature that determines whether your webhook product feels professional or homemade. The bar is set by Stripe's event log: list view, filters by event type and status, click into a single event, see the full payload, see every delivery attempt with response code and body, and a "Resend" button.

A minimum-viable webhook dashboard ships:

A list view of recent events (last 30 days), filterable by type and delivery status.
A detail view per event showing the JSON payload (pretty-printed, copyable) and each delivery attempt's HTTP status, response body, and timestamp.
A "Resend" button on any past event.
An endpoints page where customers add URLs, see signing secrets, rotate secrets, and disable endpoints.
A "test event" generator so customers can fire a sample payload without doing the real action in your app.

Stripe goes further with a CLI (stripe listen, stripe trigger) that forwards live events to localhost. Shopify ships a similar local-tunnel tool. If you have any budget for tooling, this is where it pays back fastest; consumer engineers love it and it shortens integration time from days to hours.

If you're auditing your current webhook stack against this list, our ship-or-skip tool is a one-page grader that scores producer architectures against the same rubric we'd use in an engineering review.

6. Ship a replay tool for support

The single most useful operational tool you can build is "replay all events for endpoint X between timestamp A and B." Customers will misconfigure their endpoint, miss a window of events, fix the config, and then ask you to replay. Without a tool, this is a 30-minute SQL query for an engineer. With a tool, it's a 10-second support action.

Build it as an internal admin page first, then expose a customer-facing version once the abuse model is clear. Rate-limit it (one replay per endpoint per hour is reasonable) so a panicked customer can't trigger a replay storm.

7. Treat PII and payload size as design constraints

Two pitfalls that take down webhook systems in their second year:

The PII pitfall: an event payload contains a customer's email, the customer's account is later deleted under a GDPR request, but the event is still in the consumer's queue or log forever. The producer-side fix is to keep payloads thin (IDs, not denormalized objects) and let consumers fetch the current object via API if they need details. This also keeps your payload schema stable when the underlying object grows fields.

The payload-size pitfall: a customer creates a bulk import of 50,000 line items in one invoice, the invoice.paid payload balloons past 1 MB, and consumer endpoints start rejecting it. Cap payload size (Stripe caps at 256 KB; Shopify at about 1 MB). When you hit the cap, send a truncated payload with a flag like data_truncated: true and a URL to fetch the full object.

Common pitfalls

A few patterns that look correct but break in production:

Webhooks fire synchronously inside your request handler. The customer creates an invoice, your app POSTs the webhook before responding, the consumer is slow, your app times out. Fix: webhook delivery must be async, dispatched off a queue, never blocking the user-facing request.
At-most-once delivery. If your delivery worker crashes after the POST succeeds but before it marks the event delivered, you'll either lose the event or double-send. Pick double-send (at-least-once) and give every event a stable id so consumers can dedupe.
Ordering guarantees. Don't promise them. Network conditions, retries, and parallel workers make strict ordering effectively impossible. Document this clearly so consumers handle events idempotently and use the event's own created timestamp.
Mixing internal and external events. Your internal pub-sub (Kafka, Redis Streams, Postgres LISTEN) probably emits a hundred events per public webhook. Don't expose the internal ones. The webhook taxonomy is a public API; the internal one is implementation detail.
No staging environment. Customers want to test integrations without polluting their production webhook log. Ship a test mode (Stripe's pattern) or a sandbox environment so they can iterate safely.

When you can skip most of this

Be honest about scope. If you're two founders pre-revenue and exactly one customer has asked for webhooks, you do not need a versioned event taxonomy and a replay dashboard. Ship the simplest thing: one event, one POST, a shared secret, manual retry from a SQL console. Spend 4 hours, not 4 weeks.

The full playbook starts paying back somewhere around 20 active webhook customers, or the first integration partner who builds a productized connector on top of your events. Before that, you're building infrastructure for users who don't exist.

Who builds the production version

A complete webhook system is roughly 2 to 4 weeks of focused work: queue, signing, retry curve, dashboard UI, replay tool, docs. It's well-defined, unglamorous, and exactly the kind of scope a senior engineer can own end-to-end while your team focuses on the core product.

This is one of the cleanest scope shapes for an on-demand booking. On Cadence, a senior tier engineer ($1,500/week, AI-native by baseline, vetted on Cursor and Claude Code) typically ships a production-grade webhook layer in 2 to 3 weeks. Median time to first commit across our 12,800-engineer pool is 27 hours, and the 48-hour free trial means you can scope the work with them before any billing starts. If you want a Build/Buy/Book recommendation on this specific workload, our decide tool walks the framework.

If you're rebuilding existing technical debt around webhooks rather than greenfield, our guide on managing technical debt in a startup covers how to sequence the rewrite without freezing feature shipping.

What to do next

Audit your current webhook system against this list:

Do you have a documented event taxonomy with stable noun.verb names?
Are payloads versioned and pinned per customer?
Is every request signed, and can customers rotate secrets without downtime?
Do you retry on a curve with a hard cap and a permanent-failure state?
Can customers see recent events, payloads, and delivery attempts in a dashboard?
Can customers replay a failed event in one click?
Can support replay a time-bounded window of events for one endpoint?

A clean sweep means your webhook system is production-ready for the next two years. Any gap is a future support ticket. The fix for most teams is a focused 2 to 4 week build, and it's one of the rare engineering investments that pays back in reduced support load within a quarter.

If you'd rather not own this build internally, a senior engineer on Cadence can scope and ship the full producer-side webhook layer inside three weeks at $1,500/week, with a 48-hour trial before any billing. Brief the spec and you'll see vetted matches in under two minutes.

FAQ

How many event types should we ship at launch?

15 to 30, across 5 to 8 nouns. Resist shipping events for every internal action; you'll regret supporting them. Stripe started with fewer than 30 and now ships about 250. You can always add events; removing them breaks customer integrations.

Should we use webhooks or polling?

Webhooks for state changes consumers need within seconds (payment success, document signed, build completed). Polling for slowly-changing aggregate state where freshness within an hour is fine. Many mature SaaS products ship both: webhooks for the hot path, a GET /events?since=... endpoint as a fallback for consumers who missed deliveries.

How do we handle webhooks for high-volume customers?

Batch where the semantics allow it. Instead of 5,000 individual order.created events for a bulk import, emit one order.batch_created with an array of IDs (capped at 100). Document the batching rules clearly. Shopify uses this pattern for inventory updates; it's the difference between "the customer's endpoint stayed up" and "we DDoS'd them at 2 a.m."

What's the minimum signing scheme to ship?

HMAC-SHA256 over {timestamp}.{raw_body} with a per-endpoint secret, sent in X-Webhook-Signature as t=...,v1=.... Support secret rotation with a 24-hour grace window. Don't invent your own scheme; the Stripe header format is well-understood by every framework and AI code generator.

How do we test webhooks in development?

Ship a local-forwarding CLI (Stripe's pattern: stripe listen --forward-to localhost:3000/webhooks) or rely on tunneling tools like ngrok. Inside your dashboard, ship a "Send test event" button that fires a hand-crafted sample payload of any event type to any endpoint. Both unlock fast iteration for the consumer-side developer and shorten time-to-first-integration from days to hours.

Harsh Shuddhalwar

Fullstack Developer

Fullstack developer at withRemote. Ships across the stack — TypeScript, Node, Postgres, Vercel. Writes on shipping speed and pragmatic architecture.

All posts