
To handle Stripe webhooks correctly, do four things in order: parse the raw request body, verify the Stripe-Signature header with stripe.webhooks.constructEvent, store the event.id in a dedupe table, and push the work to a background queue before returning 200. Skip any one of those layers and your billing breaks the first time Stripe retries.
That is the playbook. The rest of this post is why each layer exists, the code for Next.js, Express, Hono, and Cloudflare Workers, the Postgres schema you can copy, and the bugs that ship anyway. Still picking a billing provider? See the cost to integrate Stripe or our Stripe vs Paddle breakdown first.
Most Stripe integrations ship layer one, sometimes layer two, and call it done. Then the first chargeback or the first 30-second outage exposes layers three and four.
Here are the four layers, in the order they execute on every request:
event.id as a unique key so a retried event does not double-charge anyone.Layer one is documented. Layers two through four are where the bugs live. The Stripe docs describe each in passing, but they never name a queue, never give you a Postgres schema, and never describe what reconciliation should look like in practice. We will.
Subscription products are now most of what gets built. Webhook volume scales linearly with MRR, so the failure modes that did not matter at $5K MRR become outages at $200K MRR. Stripe also enforces a 100 req/s live API limit, which naive reconciliation loops trip immediately.
AI-assisted billing flows ship faster than ever. Founders generate a Stripe integration with Cursor or Claude Code in an afternoon, push to prod, and discover three weeks later that no one downgrades when they cancel. The shortcut AI tools skip is exactly the four-layer pattern above. Pair this with solid API validation using Zod so the events you do receive are typed end to end.
Signature verification needs the byte-for-byte raw request body. The moment any middleware parses the JSON, the signature breaks. This is the single most common Stripe webhook bug.
// app/api/webhooks/stripe/route.ts
import { NextRequest, NextResponse } from "next/server";
import Stripe from "stripe";
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const webhookSecret = process.env.STRIPE_WEBHOOK_SECRET!;
export const runtime = "nodejs";
export const dynamic = "force-dynamic";
export async function POST(req: NextRequest) {
const sig = req.headers.get("stripe-signature");
if (!sig) return NextResponse.json({ error: "no sig" }, { status: 400 });
const rawBody = await req.text(); // do NOT call req.json()
let event: Stripe.Event;
try {
event = stripe.webhooks.constructEvent(rawBody, sig, webhookSecret);
} catch (err) {
return NextResponse.json({ error: "bad sig" }, { status: 400 });
}
await enqueue(event); // returns immediately
return NextResponse.json({ received: true }, { status: 200 });
}
Note req.text(), not req.json(). App Router does not pre-parse, but if you reach for req.json() to inspect the payload, the raw bytes are already gone and the next deploy will silently fail signature verification.
import express from "express";
import Stripe from "stripe";
const app = express();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const secret = process.env.STRIPE_WEBHOOK_SECRET!;
// IMPORTANT: raw parser scoped to this route only
app.post(
"/webhooks/stripe",
express.raw({ type: "application/json" }),
async (req, res) => {
const sig = req.headers["stripe-signature"] as string;
let event: Stripe.Event;
try {
event = stripe.webhooks.constructEvent(req.body, sig, secret);
} catch (err) {
return res.status(400).send("bad sig");
}
await enqueue(event);
res.status(200).json({ received: true });
}
);
app.use(express.json()); // global JSON parser AFTER the webhook route
Order matters. If express.json() registers globally before the webhook route, req.body is a parsed object and constructEvent throws.
import { Hono } from "hono";
import Stripe from "stripe";
const app = new Hono();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const secret = process.env.STRIPE_WEBHOOK_SECRET!;
app.post("/webhooks/stripe", async (c) => {
const sig = c.req.header("stripe-signature");
if (!sig) return c.json({ error: "no sig" }, 400);
const raw = await c.req.text();
let event: Stripe.Event;
try {
event = stripe.webhooks.constructEvent(raw, sig, secret);
} catch {
return c.json({ error: "bad sig" }, 400);
}
await enqueue(event);
return c.json({ received: true });
});
Workers do not ship the Node crypto module Stripe's SDK reaches for. Use constructEventAsync with the Web Crypto API:
import Stripe from "stripe";
export default {
async fetch(req: Request, env: Env): Promise<Response> {
const stripe = new Stripe(env.STRIPE_SECRET_KEY, {
httpClient: Stripe.createFetchHttpClient(),
});
const sig = req.headers.get("stripe-signature");
if (!sig) return new Response("no sig", { status: 400 });
const raw = await req.text();
let event: Stripe.Event;
try {
event = await stripe.webhooks.constructEventAsync(
raw,
sig,
env.STRIPE_WEBHOOK_SECRET,
undefined,
Stripe.createSubtleCryptoProvider()
);
} catch {
return new Response("bad sig", { status: 400 });
}
await env.QUEUE.send({ id: event.id });
return new Response("ok", { status: 200 });
},
};
constructEventAsync is the edge-compatible version. The synchronous one will throw on Workers and most edge runtimes.
The Stripe-Signature header looks like t=1715800000,v1=abc123.... Stripe's SDK does the work for you: it extracts the timestamp, recomputes the HMAC-SHA256 over ${timestamp}.${rawBody} using your STRIPE_WEBHOOK_SECRET, and compares it in constant time. It also enforces a 5-minute tolerance so an attacker cannot replay an old captured request a week later.
If verification fails, return 400 and log the failure with no event details (the payload is untrusted at that point). Failed verifications in production usually mean one of three things: the wrong whsec_ is in your env (staging secret in prod is depressingly common), a middleware is mutating the body, or someone is probing your endpoint.
Stripe will send you the same event more than once. It is not unusual; it is part of the contract. Their retry rules guarantee at-least-once delivery, never exactly-once. The only way to handle that correctly is to dedupe on event.id.
A Set in memory will not survive a deploy or a multi-instance setup. Use Postgres:
create table stripe_events (
id text primary key, -- the event.id from Stripe (evt_...)
type text not null,
received_at timestamptz not null default now(),
processed_at timestamptz,
payload jsonb not null
);
create index on stripe_events (received_at);
create index on stripe_events (type, received_at desc);
The handler pattern (inside your worker, not the webhook endpoint):
const result = await db.execute(sql`
insert into stripe_events (id, type, payload)
values (${event.id}, ${event.type}, ${event.data.object})
on conflict (id) do nothing
returning id
`);
if (result.rows.length === 0) {
// already processed (or in-flight from another worker)
return;
}
await handleEvent(event);
await db.execute(sql`
update stripe_events set processed_at = now() where id = ${event.id}
`);
The key move: the INSERT ... ON CONFLICT DO NOTHING runs first, in its own statement, before any side effect. If two workers grab the same event from the queue at the same time, exactly one of them gets a row back; the other returns immediately. This is the only correct dedup pattern. Storing the event ID after the work runs means a crash mid-handler causes a double-process on the next retry.
Add a TTL job that deletes rows older than 90 days. Your dedup window only needs to cover Stripe's 3-day retry window, but the rows are useful for debugging.
Stripe gives you roughly 20 seconds before it considers the request a failure. Anything that calls another HTTP API, sends an email, or recomputes a derived value should not run inline. Push it to a queue, return 200, let the worker handle it.
Three queues we recommend, depending on stack:
| Queue | Best for | Notes |
|---|---|---|
| BullMQ (Redis) | Self-hosted Node apps with existing Redis | Cheap, fast, full control. You run the workers. |
| Inngest | Vercel/Cloudflare/serverless apps | Durable steps, replay UI, generous free tier. Zero infra. |
| Trigger.dev | Long-running jobs (image gen, AI workflows) | First-class for retries and observability, opinionated SDK. |
Cloudflare Workers users have a fourth option: native Cloudflare Queues, which integrate with Workers in one binding. For most SaaS we see, Inngest hits the right balance of zero-ops and observability.
The job payload only needs the event.id. Refetch the full event inside the worker with stripe.events.retrieve(event.id) so you always work from Stripe's current state, not a stale payload that sat in a queue for 30 minutes.
If you are building this from scratch and want a sanity check on the architecture before shipping, audit your stack with Ship or Skip. It will flag things like inline webhook processing or missing dedup before they bite you in production.
Webhooks fail. Your queue goes down for 4 minutes. Cloudflare has an incident. Your DNS provider rotates a cert and your endpoint returns 525 for an hour. Stripe retries for 3 days, but if the failure window outlasts that, the event is gone.
Reconciliation is the cheapest insurance against this. A nightly cron job lists Stripe events in the last 26 hours, compares against your stripe_events table, and replays anything missing.
// runs nightly via cron
async function reconcileStripeEvents() {
const since = Math.floor((Date.now() - 26 * 60 * 60 * 1000) / 1000);
const seen = new Set(
(
await db.select({ id: stripeEvents.id }).from(stripeEvents)
.where(gte(stripeEvents.receivedAt, new Date(since * 1000)))
).map((r) => r.id)
);
for await (const event of stripe.events.list({
created: { gte: since },
limit: 100,
})) {
if (!seen.has(event.id)) {
await queue.add("stripe-event", { id: event.id, source: "reconcile" });
}
}
}
Three details that matter. Run it 26 hours back, not 24, so you cover any clock skew or job-start drift. Use stripe.events.list with auto-pagination so a 5,000-event night does not blow up. Tag reconciled events in your queue payload so you can alert if reconciliation is finding more than a tiny number of missing events; that is your early warning that webhooks are failing silently.
For a typical SaaS billing setup, these five events cover 95% of what you care about:
| Event | What it means | What to do | Idempotency risk |
|---|---|---|---|
invoice.payment_succeeded | Recurring charge cleared | Extend subscription, send receipt | Double-renewal if you skip dedup |
invoice.payment_failed | Charge declined or retry exhausted | Email user, mark account past_due | Repeated dunning emails |
customer.subscription.updated | Plan, quantity, or trial changed | Sync plan to your DB, recompute entitlements | Stale entitlements |
customer.subscription.deleted | Subscription cancelled or expired | Downgrade or revoke access | Paid features stay enabled |
charge.dispute.created | Chargeback opened | Freeze account, page on-call, gather evidence | Auto-refund loops |
Partial handling is worse than none. If you handle customer.subscription.updated but ignore customer.subscription.deleted, you have customers who pay nothing but keep their access. Pick the events you handle, document them, and ignore the rest cleanly with a default: return; in your switch.
The five we see over and over:
app.use(express.json()) registered globally, then a webhook route below it. Verification breaks only on real Stripe events; manual curl tests still pass, which makes it brutal to debug.event.id after the handler runs. Worker crashes mid-handler, Stripe retries, your DB has no record, the side effect runs again. Always insert into stripe_events first.whsec_ in prod. Each endpoint has its own secret. Sharing them silently weakens verification and lets a leaked staging secret spoof prod.4xx responses as transient. A 404 on customer.retrieve means that customer was deleted, not that you should retry. Switch on the error code.Each looks fine in dev, ships clean, and surfaces only under production traffic. Same playbook applies to rate limiting your API correctly so upstream retries do not knock you over.
Be honest about scope. Pre-revenue with Stripe Checkout for one-time payments only? Skip reconciliation, run a verified handler that stores the event and downgrades plans on cancel. The math shifts when any of these are true:
Below those thresholds, ship layers one and two and revisit when a trigger fires. No point reconciling 12 events a day; every point reconciling 12 a minute.
If billing reliability is on your roadmap and you have nobody to own it, this is exactly the work a senior engineer at $1,500/week handles end to end on Cadence: schema, queue, reconciliation, monitoring, the lot. Auto-matched in 2 minutes, 48-hour free trial.
Want a second pair of eyes on your billing stack? Audit your stack with Ship or Skip. It grades your webhook handler, dedup pattern, and reconciliation setup honestly. Free, no signup.
POST /webhooks/stripe route in your framework of choice. Use req.text() (Next.js / Hono) or express.raw({ type: "application/json" }) scoped to that route only. Never let a JSON parser run before signature verification.STRIPE_WEBHOOK_SECRET from env, pass the raw body and the stripe-signature header to stripe.webhooks.constructEvent, and return 400 on failure. On Cloudflare Workers or other edge runtimes, use constructEventAsync with createSubtleCryptoProvider.stripe_events table with id text primary key. Run INSERT ... ON CONFLICT DO NOTHING inside the worker before any side effects so duplicate deliveries return immediately.{ id: event.id } to BullMQ, Inngest, Trigger.dev, or Cloudflare Queues. Return 200 from the HTTP handler within the 20-second window so Stripe does not retry.stripe.events.retrieve(event.id), switch on event.type, handle the 5 critical events (invoice.payment_succeeded, invoice.payment_failed, customer.subscription.updated, customer.subscription.deleted, charge.dispute.created), and update processed_at when done.stripe_events table, and re-enqueues anything missing. Alert if the gap is larger than a handful of events per night.Stripe expects a 2xx response within roughly 20 seconds. Anything longer is treated as a failure and the event enters Stripe's retry queue, which runs with exponential backoff for up to 3 days in live mode (5 minutes, 30 minutes, 2 hours, 5 hours, 10 hours, then every 12 hours).
If your handler does anything beyond a single DB insert (sending email, calling external APIs, recomputing entitlements), push it to a queue. Inline processing is the most common cause of dropped Stripe events at scale, because one slow downstream call cascades into timeouts and Stripe retries that pile up.
If your worker crashes mid-processing, Stripe retries the event, and your code does the work twice. Insert the event.id into your dedup table first using INSERT ... ON CONFLICT DO NOTHING, then run the side effect. Update processed_at after success so you can tell the difference between "in flight" and "done" in dashboards.
No. Stripe issues a unique whsec_ for each endpoint you create. Sharing them defeats signature verification's purpose and means a leaked staging secret can spoof production traffic. Configure each environment's secret separately and rotate them via the Workbench when team members leave.
Yes. Webhooks fail for reasons outside your code: DNS issues, certificate expiry, edge incidents, your queue going down, your database failing over. A nightly reconciliation job listing Stripe events and diffing against your local stripe_events table is the cheapest insurance against silently missed events, and it doubles as an alarm when something upstream is broken.