
To monitor a SaaS app in 2026, install Sentry for errors, Better Stack for uptime and on-call, PostHog for product analytics, and lean on your host's built-in metrics (Vercel, Render, Fly). Total cost: $0 to $200 per month until you cross roughly 50 paying customers. Wire alerts against the four golden signals (latency, traffic, errors, saturation). Graduate to Datadog or full OpenTelemetry only when paging volume or revenue justifies the bill.
That's the answer. The rest of this post is the playbook: which vendor does what, how to install each one, what an actual SLO looks like, how to wire alerts, and when to graduate to a heavier stack.
Most founders use the word "monitoring" to mean five different things, and the confusion is what makes them either over-buy or under-buy. The five layers, in priority order:
/api/checkout reachable from Sydney? Better Stack, Pingdom, UptimeRobot.You need at least three of the five from day one: errors, uptime, and product analytics. The other two come along for free with whatever PaaS you're already on. Skipping any of the first three means you'll learn about your outages from a Slack DM that starts with "hey, is anything broken on your end?"
There are two failure modes, and both are expensive.
Over-buy: You read a Datadog blog post, get excited about distributed tracing, and provision Datadog APM at seed stage. Datadog APM starts at $15 to $23 per host per month, and "host" includes every Render instance, every Fly machine, every preview deploy. A typical seed-stage SaaS with 8 to 12 hosts pays $1,500 to $3,000 per month for capability it cannot yet use, because there are no microservices to trace.
Under-buy: You ship the product with no monitoring beyond the Vercel error log. Your first paying customer hits a bug at 2am Pacific. You find out 14 hours later when they email asking for a refund. The lifetime-value math on that customer is now negative, and you wasted the engineering hour you would have spent on a $25 Better Stack monitor.
The right answer in 2026 is the same as it was in 2022: start cheap, instrument the four golden signals, and let usage drive the upgrade. The tooling has gotten dramatically better at the low end. Sentry, PostHog, and Better Stack all have free tiers that comfortably cover a SaaS doing $0 to $20k MRR.
Here's the full stack a Cadence engineer would ship for a typical Next.js or Node SaaS in 2026:
| Layer | Starter pick | Free tier | Paid tier | When to upgrade |
|---|---|---|---|---|
| Errors | Sentry | 5k errors / month | $26 / month | When errors > 5k/mo |
| Uptime + on-call | Better Stack | 10 monitors | $25 / month | Pay from day one |
| Product analytics | PostHog Cloud | 1M events / month | $0.00045 / event | When events > 1M/mo |
| Infra metrics | Vercel / Render built-ins | Included | Included | When you outgrow PaaS |
| Logs | Better Stack Logs | 30 GB / month | $0.30 / GB | When logs > 30 GB/mo |
Total cost at zero traffic: $25 per month (Better Stack paid plan; everything else on free tier). Total cost at 50 paying customers: $75 to $200 per month, depending on event volume. You will not need Datadog at this stage. You will not need full OpenTelemetry. You will need exactly five accounts and about 40 lines of integration code.
Google's SRE team named the four golden signals back in 2016, and they still hold up. The trick in 2026 is mapping each one onto a real tool in the stack above, with concrete thresholds.
The time from request received to response sent. Measure P50 (median), P95 (slow user experience), and P99 (worst case). For a typical SaaS API in 2026, sane starting SLOs:
These come from Vercel Analytics, Render's built-in metrics, or Sentry Performance. Don't measure from your own laptop. Measure from the edge, where your users actually live.
Requests per minute, broken down by route. The point isn't the absolute number; it's the slope. A 3x spike at 4am that doesn't match your usual pattern is either a bot, a customer's broken integration, or a viral tweet. Vercel and Render both expose this for free. PostHog gives you the product-event version.
Two flavors: HTTP 5xx (server errors) and HTTP 4xx (client errors). 5xx is your problem. 4xx is sometimes your problem (broken auth flow) and sometimes the user's. Sentry handles uncaught exceptions; your host's logs handle 5xx counts. Set the alert threshold at "more than 1% of requests failing for 5 minutes," not "any single error."
How full your most-constrained resource is. For a Next.js app on Vercel, this is rarely you (Vercel autoscales). For a Postgres-backed SaaS, this is almost always database connections. For a queue worker on Render, it's queue depth.
The single most useful saturation alert in 2026: "Postgres connection pool > 80% in use for 5 minutes." This catches connection leaks, runaway migrations, and the moment your app outgrows its database tier. Render and Supabase both expose pool metrics in their dashboards. Wire that into Better Stack.
Real working snippets for a Next.js 15 app on Vercel. Drop these in and you have errors, uptime, and product analytics live in under an hour.
Sentry init (sentry.client.config.ts):
import * as Sentry from "@sentry/nextjs";
Sentry.init({
dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
tracesSampleRate: 0.1,
replaysSessionSampleRate: 0.0,
replaysOnErrorSampleRate: 1.0,
environment: process.env.NODE_ENV,
});
Better Stack health endpoint (app/api/health/route.ts):
import { db } from "@/lib/db";
export async function GET() {
try {
await db.execute("SELECT 1");
return Response.json({ status: "ok", ts: Date.now() });
} catch (e) {
return Response.json({ status: "degraded" }, { status: 503 });
}
}
Then in Better Stack: create an HTTP monitor pointing at https://yourapp.com/api/health, set "expected status: 200," and add yourself to the on-call escalation policy. Three clicks.
PostHog init (app/providers.tsx):
"use client";
import posthog from "posthog-js";
import { PostHogProvider } from "posthog-js/react";
if (typeof window !== "undefined") {
posthog.init(process.env.NEXT_PUBLIC_POSTHOG_KEY!, {
api_host: "https://us.i.posthog.com",
capture_pageview: "history_change",
});
}
export function Providers({ children }: { children: React.ReactNode }) {
return <PostHogProvider client={posthog}>{children}</PostHogProvider>;
}
A real SLO definition (drop in slos.yaml for documentation; tools like Nobl9 or Grafana SLO consume the same shape):
slos:
- name: api-availability
target: 0.995
window: 30d
indicator: http_5xx_rate < 0.005
- name: api-latency-p95
target: 0.99
window: 30d
indicator: http_p95_ms < 500
That is the entire monitoring stack. Roughly 40 lines of code, three vendor accounts, one health endpoint. Many teams overcomplicate this because the Datadog blog makes it sound like you need 14 dashboards before you ship. You don't.
For teams that want a second opinion before locking in vendors, auditing your stack with Ship-or-Skip takes 5 minutes and tells you which layers you're actually missing versus which ones you can defer.
/api/health. Total: about 40 lines of code.slos.yaml in the repo. The number matters less than having one written down.The starter stack carries you to roughly 50 paying customers or $20k MRR, whichever comes first. Past that, you'll start hitting one of three triggers:
When any two of those trigger, graduate. The two clean paths in 2026:
Either way, do not graduate before the triggers fire. Premature observability is one of the more expensive mistakes early SaaS teams make.
A few patterns we see repeatedly when we drop a senior engineer into a SaaS codebase to fix monitoring:
beforeSend to scrub emails, tokens, and credit-card numbers. Doing this in month one is much cheaper than doing it during a SOC 2 audit. Pair this with a GDPR data-deletion playbook so logs respect the same retention rules as your primary database.tracesSampleRate. Sentry will happily eat your whole free tier in a week if you set tracesSampleRate to 1.0 on a high-traffic page. Start at 0.1 and tune up.For most seed-stage SaaS teams, this is a half-week of work for one strong full-stack engineer (and yes, the half-week estimate holds up once you sit down with someone who has shipped this stack before). The hard parts are not the SDK installs; they're the SLO definitions, the alert thresholds (which are judgment calls based on real traffic), and the on-call runbook. A junior engineer can do the installs. A senior engineer is what you want for the SLO and runbook work, because they've been paged at 3am before and write very different runbooks because of it.
If you don't have that engineer in-house, this is exactly the kind of bounded scope a Cadence senior engineer ($1,500/week) ships in one billing week. Every engineer on Cadence is AI-native by default (vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings), so the SDK install and runbook drafting are dramatically faster than they were three years ago. Cadence's pool of 12,800 engineers means you can usually have someone shortlisted in the time it takes to write the spec.
Skip the recruiter loop. Book a senior engineer in 2 minutes, get a 48-hour free trial, and have your monitoring stack live by Friday. Replace the engineer any week, no notice period.
A single engineer can ship the full $0 to $200/month stack in 2 to 4 days, including SLOs and on-call rotation. The actual SDK installs take an afternoon; the rest of the time goes to writing thresholds, the on-call runbook, and tuning sample rates against real traffic.
No. Datadog is excellent but priced for teams with 5+ services and meaningful traffic. At seed stage you'll pay $1,500 to $3,000 per month for capability you can't yet use, because there's nothing to correlate. Start with Sentry, Better Stack, and PostHog. Graduate to Datadog when you cross 50 paying customers, $20k MRR, or 3+ services that need correlated tracing.
Probably not. OpenTelemetry is the long-term standard and where most serious SaaS infrastructure ends up, but it adds real setup cost. Start with vendor SDKs (Sentry, PostHog) and migrate to OTel once you have 3+ services that need correlated tracing across them.
Better Stack Logs at $0.30/GB, or your host's built-in log viewer (Vercel, Render, Fly), is enough for the first 12 months. Do not pay for Datadog Logs or Splunk at seed stage; the bill scales with ingest volume and you'll regret enabling debug logs on a hot path.
You do, with one rule: only page yourself for paying-customer-impacting alerts. Configure Better Stack quiet hours (no Slack noise after 11pm for non-SEV-1 alerts) and escalation policies that delay paging by 2 minutes for transient flaps. When you hire your second engineer, split the week.
The Postgres connection-pool saturation alert (or the equivalent on whatever database you use). It catches connection leaks, runaway migrations, and the moment your app outgrows its database tier. Set it at 80% pool usage for 5 minutes.