How to set up error tracking and logging for a startup

Q: What's the minimum PII redaction setup?

Strip Authorization headers, cookies, password, token, creditCard, and email fields at the logger level (not downstream). Log hashed user IDs and Stripe customer IDs instead of raw PII. Audit quarterly with regex over a day of logs to catch redaction misses. Pair this with the safety practices in our guide on rolling out feature flags safely so new code paths don't leak through.

The day-1 stack for a startup is Sentry for errors, pino for structured logs shipped to Better Stack or Axiom, and a single Slack channel for alerts. Total cost: $0 to $50 a month until you're past 5,000 errors per month or 30 GB of log ingest. Skip PagerDuty until after product-market fit. The rule that matters more than the tools: one alert per failure mode, severity-tagged, no exceptions.

That's the whole answer. The rest of this post is how to ship it without creating noise that your team will mute in three weeks.

Why this matters more in 2026

Three things changed since 2023 that make early observability a much bigger early investment than it used to be.

First, AI-assisted shipping speed went up. A team of two using Cursor and Claude Code ships features that would have taken eight engineers in 2022. The bug count per shipped feature didn't go down by the same factor. More shipping, same human review bandwidth, means production is where you find issues now. Observability moved from "nice to have post-Series-A" to "you need it on day 1 or you ship blind."

Second, log ingest pricing got sane. Datadog still charges enterprise rates, but Better Stack, Axiom, Highlight, and Baselime exist now. Free tiers cover the first 6 months of a real startup.

Third, AI Overviews and ChatGPT-driven traffic mean a 500 error on a product page now costs you both the conversion and the citation. You need to know within 90 seconds, not the next morning when a user emails support.

The default approach (and why it breaks)

Most founders set up console.log and check Vercel's function logs when something goes wrong. This works until exactly the point where it doesn't, usually around 100 daily active users.

The failure mode looks like this. A user reports the checkout button is broken. You check Vercel logs. You see 400 log lines per minute, none of them structured, none of them correlated to the user's session. You spend 40 minutes grepping. The bug was a third-party webhook timeout that fired once at 2:14am, and the user retried at 9:30am with cached state. You will never find this with console.log.

The fix is structured logs plus an exception tracker. Together they take about 90 minutes to set up and they pay for themselves the first time you debug a production issue without scrolling.

The day-1 stack

Here's the minimum viable observability setup we'd ship into a fresh Next.js or Node service today.

Layer	Tool	Cost at startup scale	Why this one
Error tracking	Sentry	Free up to 5,000 errors/mo	Best-in-category source maps, session replay, release tracking
Structured logs	pino (Node) or structlog (Python)	Free, open source	Fastest JSON logger, low overhead
Log aggregation	Better Stack or Axiom	$0 to $50/mo	Generous free tier, fast search, S3-backed cheap retention
Alert routing	Slack incoming webhook	Free	Where the team already lives
Paging	None on day 1	$0	Use PagerDuty only after PMF
Uptime checks	Better Stack Uptime or BetterUptime	Free for 10 monitors	Pings + status page in one tool

Total day-1 cost: $0 to $50 per month. Total setup time: 90 to 120 minutes if you've done it before.

We'll cover each layer in order.

Sentry for exceptions

Install @sentry/nextjs or @sentry/node, paste in your DSN, deploy. Out of the box you get unhandled exceptions, source maps if you upload them in CI, release tagging if you set SENTRY_RELEASE, and a free tier of 5,000 errors per month and 10,000 performance units.

The configuration decisions that matter:

Set tracesSampleRate: 0.1 in production. Sampling at 10% catches the patterns without burning your performance quota.
Enable beforeSend PII redaction. Strip emails, tokens, and credit card patterns from event payloads before they leave your server. Sentry has a built-in EventScrubber but write your own regex pass too. It's 20 lines.
Tag every event with release, environment, and userId (hashed). Without these, you can't tell which deploy introduced the regression or which customer hit it.
Add Sentry.captureException(err, { tags: { feature: 'checkout' } }) at every try/catch boundary that owns a business operation. Don't rely on uncaught propagation alone.

What can go wrong: leaving tracesSampleRate: 1.0 from a tutorial and burning your free tier in 11 days. We've seen it three times this year.

pino for structured logs

console.log outputs unstructured text. pino outputs newline-delimited JSON at roughly 5x the throughput. Switch the moment you have more than one service or more than one log stream worth searching.

The pattern that works:

import pino from 'pino';

export const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  redact: {
    paths: ['req.headers.authorization', 'req.headers.cookie', '*.password', '*.token', '*.creditCard'],
    censor: '[REDACTED]',
  },
  base: { service: 'api', env: process.env.NODE_ENV, release: process.env.SENTRY_RELEASE },
});

logger.info({ userId: hashedId, action: 'checkout.start', cartValue: 4900 }, 'checkout started');

Three things to notice. The redact config strips PII before serialization, not after. Every log line carries service, env, and release automatically. Each event has a stable action field you can query on (action:"checkout.start"), not a free-text message.

The free-text message is for humans skimming. The structured fields are for machines querying. Always include both.

Ship logs to Better Stack or Axiom

pino writes to stdout. Your hosting platform (Vercel, Fly, Render, Railway) captures stdout. From there you ship to a log aggregator.

We recommend Better Stack or Axiom for a sub-$50/month bill. Both have generous free tiers (Axiom gives 500 GB/month free as of writing, Better Stack gives 1 GB and 3 days of retention free with reasonable paid tiers above that). Both support fast SQL-like queries. Both have Slack integration.

Datadog is technically more powerful, but you'll pay $1,500 to $4,000 a month before you have enough traffic to justify it. The same is true for Splunk. Save the migration for Series A.

Setup is 10 minutes: install the platform's log drain integration (Vercel has a one-click for both Better Stack and Axiom), set LOG_LEVEL=info in production, deploy. Logs appear within a minute.

Slack alerts with severity tagging

One Slack channel called #alerts. Two webhook routes from Sentry: one for level:error and above, one for level:fatal. Tag every Sentry alert with a severity emoji at the start (🟡 warning, 🟠 error, 🔴 fatal). That's it.

The single rule that prevents alert fatigue: one alert per failure mode, not per occurrence. Sentry deduplicates by stack trace fingerprint by default; do not disable this. If Stripe webhook timeout fires 400 times in 10 minutes, that's one alert with a "400 events" counter, not 400 messages.

For your first 6 months, this is enough. You'll know within 90 seconds when production breaks, and the channel will stay quiet enough that people still look at it.

Severity tagging that actually works

Most teams over-engineer this. Four levels, mapped to actions:

debug: local development only. Filtered out in production.
info: normal operations. Stored, searchable, not alerted.
warn: unexpected but recoverable. Stored, surfaced in a weekly digest, not paged.
error: something a user noticed or will notice. Slack alert, no page.
fatal: the service is down or money is being lost. Slack alert AND (post-PMF) a page.

The trap is the temptation to add critical, urgent, severe, or service-specific levels. Don't. Engineers can't keep five severity scales in their head. The five above map cleanly to "do I act on this now, today, or never."

Sampling at scale

You won't need this for the first year. After that, sampling stops being optional.

At 10 million log lines a month you'll pay for ingest in real money. Three sampling strategies that work:

Tail-based trace sampling. Keep 100% of traces that include an error or take more than 2 seconds. Keep 5% of the rest. This is the default for OpenTelemetry collectors and Honeycomb.
Log level filtering. Drop info in production, keep warn and above. You lose forensic context for normal flows; you can re-enable for a single service when investigating.
Per-route rate limiting. A noisy health-check endpoint can generate 90% of your logs. Cap that route at 10 logs per minute at the edge.

The pattern is: sample noisy, keep rare, never sample errors. An error that gets sampled out is the one that turns into a customer-discovered bug at 4am.

PII redaction is non-negotiable

If you ship a single log line with a customer email, password, or full credit card number to a third-party log service, you've created a compliance problem. SOC 2 auditors will flag it. GDPR makes it a reportable incident if EU users are involved.

The fixes:

Redact at the source. Your logger config (the redact block in pino) strips known PII fields before serialization. Don't trust downstream redaction.
Allowlist, don't blocklist, where possible. Log a hashed user ID, not the email. Log a Stripe customer ID, not the card. Log a session ID, not the cookie.
Audit quarterly. Run a regex over a day of logs looking for @.*\.com, 16-digit numbers, common token prefixes (sk_, pk_, Bearer ). Anything that shows up is a redaction miss.

This connects directly to broader compliance work, including the basics in our guide on how to design a SaaS for HIPAA from day 1 if you're handling health data, and the patterns in implementing OWASP Top 10 mitigations for the rest of the surface area.

The "one alert per failure mode" rule

This is the single rule that separates teams who trust their alerts from teams who mute them.

A failure mode is a category of breakage, not an instance of it. "Stripe webhook handler throws" is a failure mode. "Stripe webhook handler threw at 14:32:18 for user X" is an instance. You want one alert per failure mode, with a counter of instances attached.

Three concrete rules that enforce this:

Group by stack trace fingerprint. Sentry does this by default. Don't disable it. Don't override it unless you have a very specific reason.
Suppress duplicate alerts within a window. If the same fingerprint fires three times in five minutes, send one Slack message with (3x) appended.
Auto-resolve when the fingerprint hasn't fired in 24 hours. Stale alerts crowd the channel and train people to ignore the unread badge.

When alert fatigue sets in (and you'll know because someone says "I muted #alerts last week"), the cause is almost always a violation of one of these three rules.

Paging discipline (PagerDuty / Incident.io comes later)

You do not need PagerDuty before product-market fit. The single biggest waste of money we see in pre-PMF startups is a $25-per-user-per-month PagerDuty bill protecting a service nobody is paying for yet.

The honest threshold is: page on-call only when (a) you have paying customers whose contract implies uptime, (b) you have at least two engineers who can actually respond, and (c) you've already shipped the "one alert per failure mode" discipline above. Without (c), you'll page someone every 40 minutes for the first week and they'll quit.

When you do hit that threshold, the stack is PagerDuty or Incident.io, integrated to Sentry's fatal-level webhook. Two on-call engineers, weekly rotation, one runbook per failure mode. Pair this with the practices in our post on writing a postmortem after an incident so you actually learn from every page.

What can you skip entirely?

Honestly: a lot, for the first 90 days.

If you are two founders pre-revenue, you do not need OpenTelemetry, you do not need a service mesh, you do not need distributed tracing across microservices (you do not have microservices). Ship Sentry, pino, Better Stack, and a Slack webhook. That covers 95% of the incidents you'll see in year one.

The things you can safely defer until you have actual scale or actual customers:

Custom metrics dashboards (use Sentry's built-in performance tab)
APM beyond Sentry traces
A status page until you have B2B customers asking for one
SOC 2 compliance tooling until a deal requires it
A dedicated SRE
Self-hosted anything

If you're choosing the underlying data layer at the same time, our guides on designing a multi-tenant Postgres schema and using Prisma in 2026 cover the patterns that pair best with the logging stack above.

Who should own this on a small team

For a 1-to-3 engineer team, observability setup is a half-day of work for a Mid or Senior engineer. The Mid tier ($1,000/week on Cadence) ships the day-1 stack: Sentry, pino, Better Stack, Slack webhooks, severity tags, the redaction layer. The Senior tier ($1,500/week) is the right call when you're adding tail-based sampling, OpenTelemetry, or migrating from a noisy legacy logger to structured logs without losing data.

If you're spending more than a day per week chasing production issues because you can't see what's happening, that's the signal to bring in help. Every engineer on Cadence is AI-native by default, vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings, so they ship observability setup at AI-assisted speed: the median time-to-first-commit on the platform is 27 hours from booking. You can audit your current stack if you're not sure what you actually need versus what a tutorial told you to install.

Common pitfalls

Logging request bodies wholesale. Includes passwords, tokens, credit cards. Always redact at the source.
One alert per error instance. Channel gets muted in 72 hours.
Sentry tracesSampleRate: 1.0 left from a tutorial. Free tier gone in under two weeks.
No release tag on Sentry events. You can't tell which deploy caused the regression.
Logging at debug level in production. 50x the volume, 50x the cost.

Each is a 10-minute fix once you spot it. Run a quarterly audit of log volume by route, alert volume by fingerprint, and Sentry quota burn rate.

What to do this week

If you have nothing in place today:

Install Sentry. 20 minutes. Free tier, tracesSampleRate: 0.1, PII redaction on.
Switch from console.log to pino with a redact block. 30 minutes.
Hook up Better Stack or Axiom log drain. 10 minutes.
Create #alerts in Slack. Route Sentry error and fatal webhooks to it. 10 minutes.
Add one runbook line per failure mode you can think of, in a single doc. 30 minutes.

Total: two hours. You will catch the next outage in 90 seconds instead of 9 hours.

If you'd rather have someone else ship this end-to-end (with the redaction, sampling, and Slack routing dialed in for your stack), book a Mid or Senior engineer on Cadence for a one-week sprint. Weekly billing, 48-hour free trial, replace any week if it isn't working.

FAQ

How much does error tracking cost for a startup?

Zero to $50 per month for the first 6 months on Sentry's free tier (5,000 errors/mo) plus Better Stack or Axiom's free log tier. You'll start paying real money around 10,000 active users or 50 GB of monthly log ingest, typically $50 to $200 per month at that scale.

Should I use Datadog from day 1?

No. Datadog is excellent at scale but starts at roughly $15 per host per month for infrastructure monitoring and stacks up fast with APM, logs, and synthetics. Pre-Series-A startups burn $1,500 to $4,000 per month before getting value. Sentry plus Better Stack covers 95% of the same surface for under $50.

When do I need PagerDuty?

When you have paying customers whose contracts imply uptime, at least two engineers who can respond, and you've already enforced one-alert-per-failure-mode discipline so on-call isn't waking up every 40 minutes. Usually post-PMF, around Series A.

How do I avoid alert fatigue?

Three rules: group alerts by stack-trace fingerprint, suppress duplicates within a 5-minute window, auto-resolve fingerprints that haven't fired in 24 hours. Together these cut alert volume by 80 to 95% without losing signal.

What's the minimum PII redaction setup?

Strip Authorization headers, cookies, password, token, creditCard, and email fields at the logger level (not downstream). Log hashed user IDs and Stripe customer IDs instead of raw PII. Audit quarterly with regex over a day of logs to catch redaction misses. Pair this with the safety practices in our guide on rolling out feature flags safely so new code paths don't leak through.

Deeksha Durgesh

Senior Automation Developer

Senior automation engineer at withRemote. Writes on CI/CD, test pyramids, and removing toil from engineering pipelines.

All posts