How to add observability to a Next.js app

To add observability to a Next.js app, instrument three pillars in this order: traces via OpenTelemetry through instrumentation.ts (stable since Next 14.0), metrics via Vercel Speed Insights plus a custom counter for business events, and logs via pino writing structured JSON to stdout. Wire all three before you ship to production, because debugging an RSC waterfall or a Server Action timeout without spans is guesswork.

That is the punchline. The rest of this guide is the playbook: which hook to use, how the edge runtime breaks half the SDKs, where Vercel quietly drops your logs, and what an actual instrumentation.ts looks like in 2026.

Why observability in Next.js is harder than it looks

Next.js is no longer one runtime. A single request might fan out across the Node.js server, the edge runtime, the React Server Component renderer, a Server Action, and a client component hydrating in the browser. Each of those layers has its own concept of a request, its own console, and its own crash mode.

Two things broke the old "wrap Express in a tracer" pattern. First, React Server Components stream HTML in chunks, so a 200 response can still contain a 500-level error buried inside a suspense boundary. Second, the edge runtime is V8 isolates, not Node, so any SDK that touches async_hooks, the file system, or process.on('uncaughtException') silently no-ops.

If you cannot answer "which RSC rendered this slow page, on which runtime, with which database call" in under 60 seconds, you do not have observability. You have logs you grep at 2am.

The three pillars, mapped to Next.js primitives

Pillar	Question it answers	Next.js mechanism	Default tool
Traces	Where did the time go for this request?	`instrumentation.ts` + OpenTelemetry	`@vercel/otel` or `@opentelemetry/sdk-node`
Metrics	How is the whole app behaving over time?	Vercel Speed Insights + custom counters	Speed Insights, Prometheus, Datadog
Logs	What exactly happened in this one execution?	`console.*` piped to a structured logger	`pino`, Axiom, Better Stack

You need all three. Traces tell you which call was slow, metrics tell you whether slowness is a trend or a blip, and logs give you the variables to reproduce a single bug. Skip one and you will rebuild it three months in.

Pillar 1: traces with the `instrumentation.ts` hook

Since Next 14, Next.js exposes a top-level instrumentation.ts file (sibling of next.config.js) that runs once per process before any request handler. This is the only safe place to register OpenTelemetry, because it runs in both the Node and edge runtimes and Next guarantees it executes before route handlers.

The minimum viable file looks like this:

// instrumentation.ts
export async function register() {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    await import('./instrumentation.node');
  }
  if (process.env.NEXT_RUNTIME === 'edge') {
    await import('./instrumentation.edge');
  }
}

export async function onRequestError(
  err: Error,
  request: { path: string; method: string; headers: Record<string, string> },
  context: { routerKind: 'Pages Router' | 'App Router'; routePath: string; routeType: 'render' | 'route' | 'action' | 'middleware' }
) {
  // Forward to Sentry, Axiom, Honeycomb, whatever
  await fetch(process.env.ERROR_INGEST_URL!, {
    method: 'POST',
    body: JSON.stringify({ err: { message: err.message, stack: err.stack }, request, context }),
  });
}

onRequestError is the Next 15.1 hook that catches errors inside Server Components, Server Actions, and route handlers without you wrapping every function. Use it. The old try/catch per route handler does not see RSC render errors.

For the Node-runtime side, the boring choice is @vercel/otel:

// instrumentation.node.ts
import { registerOTel } from '@vercel/otel';

registerOTel({
  serviceName: 'cadence-app',
  instrumentations: ['fetch', 'http', 'pg', 'fs'],
});

That gives you spans for outbound fetches, Postgres queries, and file reads. Pipe them to any OTLP-compatible backend (Honeycomb, Grafana Tempo, Datadog, New Relic). For the edge runtime, only the fetch instrumentation works, because edge isolates do not have http or pg modules. Most teams ship a stripped-down edge instrumentation file with only ['fetch'].

RSC tracing

Server Components are async functions that the renderer calls. To trace one, wrap the body in a span:

import { trace } from '@opentelemetry/api';
const tracer = trace.getTracer('rsc');

export default async function DashboardPage() {
  return tracer.startActiveSpan('DashboardPage', async (span) => {
    const data = await fetchDashboard();
    span.setAttribute('row_count', data.rows.length);
    span.end();
    return <Dashboard data={data} />;
  });
}

If you do this in every page, you can see a flame graph of which RSC stalled. That is the single most useful view when a page feels slow but the network tab looks fine.

Server Action observability

Server Actions are invisible to most APM tools because they look like a POST to /. Tag them yourself:

'use server';
export async function deletePost(id: string) {
  return tracer.startActiveSpan('action.deletePost', async (span) => {
    span.setAttribute('post_id', id);
    const r = await db.posts.delete({ where: { id } });
    span.end();
    return r;
  });
}

Without this tag, every Server Action error in your dashboard shows up as "POST / 500" and you spend an hour bisecting.

Pillar 2: metrics

Two flavors here, both cheap to wire.

Web Vitals. Drop <SpeedInsights /> from @vercel/speed-insights/next into the root layout. You get LCP, INP, CLS, FCP, and TTFB segmented by route, device, and country, with no extra code. Free on Vercel Hobby, paid above 25k data points/month. Independent of Vercel, you can collect the same data with the web-vitals package and POST to your own endpoint.

Custom counters. For business events (signups, checkout completions, action latencies), the cheapest path is the OpenTelemetry meter API, which speaks to Prometheus, Datadog, and OTLP backends without code changes:

import { metrics } from '@opentelemetry/api';
const meter = metrics.getMeter('cadence-app');
const signups = meter.createCounter('signups_total');

export async function POST(req: Request) {
  // ...
  signups.add(1, { plan: body.plan });
  return Response.json({ ok: true });
}

A useful pattern from our experience helping teams ship on Vercel: track p95 latency on the five highest-traffic routes, and alert when it doubles week-over-week. Most outages show up there first, before customers email you. Tools like feature flag rollouts also benefit from a per-flag latency metric, so you can tell whether a rollout regressed performance.

Pillar 3: structured logs with pino

The Vercel runtime captures anything you write to console.log, but the default format is unstructured strings, which makes them painful to query. Use pino to emit one JSON object per line:

// lib/log.ts
import pino from 'pino';
export const log = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  base: { service: 'cadence-app', env: process.env.VERCEL_ENV },
  formatters: { level: (label) => ({ level: label }) },
  timestamp: pino.stdTimeFunctions.isoTime,
});

Now every log line is {"level":"info","service":"cadence-app","env":"production","time":"...","msg":"order created","orderId":"..."}, which is grep-friendly and ingest-friendly. Pipe Vercel's log drain into Axiom, Logtail, or Datadog.

Edge runtime gotcha

pino uses worker threads on Node, which the edge runtime does not have. In edge functions, fall back to a tiny console.log(JSON.stringify({...})) helper. Detect the runtime with process.env.NEXT_RUNTIME === 'edge' and branch.

Vercel's hidden log limit

Vercel's runtime logs cap at 4KB per line and 256KB per invocation on the Hobby and Pro tiers. Logs above that are silently truncated. Two consequences: do not stringify a full request body, and use a log drain rather than relying on the dashboard for anything beyond debugging. The dashboard retains 1 hour on Hobby, 1 day on Pro, 3 days on Enterprise. If you need real history, ship to Axiom or Better Stack from day one.

A full `instrumentation.ts` example

Here is what we ship for a typical App Router project. Same idea as our guide on writing production-grade tests in 2026: boring tools, wired carefully, instead of clever ones half-installed.

// instrumentation.ts
export async function register() {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const { registerOTel } = await import('@vercel/otel');
    registerOTel({
      serviceName: 'cadence-app',
      instrumentations: ['fetch', 'http', 'pg'],
    });
  }
  if (process.env.NEXT_RUNTIME === 'edge') {
    const { registerOTel } = await import('@vercel/otel');
    registerOTel({
      serviceName: 'cadence-app-edge',
      instrumentations: ['fetch'],
    });
  }
}

export async function onRequestError(err, request, context) {
  const payload = {
    err: { message: err.message, stack: err.stack, name: err.name },
    request: { path: request.path, method: request.method },
    context: { routerKind: context.routerKind, routePath: context.routePath, routeType: context.routeType },
  };
  if (process.env.SENTRY_DSN) {
    await fetch(process.env.SENTRY_INGEST_URL!, {
      method: 'POST',
      headers: { 'content-type': 'application/json' },
      body: JSON.stringify(payload),
    }).catch(() => {});
  }
  console.error(JSON.stringify({ level: 'error', ...payload }));
}

That single file gives you traces in your APM, errors in Sentry, and structured stderr for the log drain. No middleware, no per-route wrappers.

Edge vs Node runtime: what works where

Capability	Node runtime	Edge runtime
`@vercel/otel` fetch tracing	yes	yes
`@vercel/otel` http/pg/fs tracing	yes	no
`pino` with worker threads	yes	no
`async_hooks` (request-scoped context)	yes	no
`process.on('uncaughtException')`	yes	no
`console.log` to Vercel runtime logs	yes	yes
`onRequestError` hook	yes	yes
Sentry SDK (@sentry/nextjs)	full	partial (no profiling)

The practical rule: put your performance-critical RSC and Server Actions on the Node runtime so you get full instrumentation, and reserve the edge runtime for middleware, auth checks, and geographically-routed redirects where you do not need deep tracing. Many teams over-adopt the edge runtime for cold-start reasons, then lose visibility and cannot debug.

Comparison of approaches

Approach	Setup time	Monthly cost (small app)	Best for
Vercel Speed Insights + console logs	10 min	$0 to $20	Solo founders pre-revenue
`instrumentation.ts` + OTel + Axiom	1 day	$25 to $100	Bootstrapped teams 1-5 engineers
Sentry + Datadog + LogRocket	2-3 days	$400+	Series A and up
Full self-hosted (Grafana + Tempo + Loki)	1-2 weeks	infra cost only	Compliance-bound, 20+ engineers

For the first two, a senior engineer on Cadence at $1,500/week can ship the entire instrumentation stack including dashboards in under a week. Above $400/month in vendor spend, you usually want someone who has built the same setup before, and the booking flow matches the same kind of work we walked through for multi-tenant Postgres schema design.

If you want a second opinion on which observability stack matches your scale, our companion piece on the best monitoring tools for Next.js in 2026 compares Axiom, Sentry, Datadog, Honeycomb, and Grafana Cloud across pricing tiers and integration depth.

Common pitfalls

Logging the full request object. Headers contain auth tokens and cookies. Log a redacted subset, or pipe to a drain that scrubs.
One tracer per file. Create the tracer at module top-level, not inside the handler. Otherwise you allocate on every request and lose batching.
Sampling at 100% in production. OTel sampling at 100% on a 1M-request/day app will cost you $300+/month at Honeycomb pricing. Start at 10% and only bump on incident.
Forgetting span.end() in error paths. Spans without end() leak memory and never export. Use a try/finally.
Mixing runtimes in one trace. A request that hits middleware (edge) then a page (node) creates two traces unless you propagate traceparent headers. @vercel/otel handles this if you let it.

When you can skip this entirely

Two founders, pre-launch, no users. You do not need OpenTelemetry. Use Speed Insights for free, console.log from your handlers, and the Vercel dashboard for the first month. The hour you would spend wiring spans is better spent shipping the feature that gets you a first customer. Add the full stack the day you cross 1,000 daily active users or the first paying customer files a "page is slow" ticket, whichever comes first.

What to do next

If you are already in production and feeling blind, here is the order of operations: install @vercel/otel and instrumentation.ts today, point traces at a free-tier Honeycomb or Axiom account, add pino for structured logs by the end of the week, and only then evaluate Sentry or Datadog for higher-fidelity error tracking. That sequence keeps you cheap until you know what you actually need. The same staged approach applies when you roll out a feature flag safely or write a postmortem after an incident; both depend on the trace and log data this guide bootstraps.

If you would rather have someone else wire all three pillars by next Friday, every engineer on Cadence is AI-native by default (vetted on Cursor, Claude Code, and Copilot fluency), and you can book a senior on a 48-hour free trial for the observability rollout. Weekly billing, replace any week, no notice period.

FAQ

How long does it take to add observability to a Next.js app?

A working baseline (traces, metrics, structured logs, error reporting) takes one senior engineer about 6 to 10 hours, assuming you already have a Vercel deployment and an account with one APM vendor. Production-grade dashboards and alerting add another 2 to 3 days.

Should I use Sentry or OpenTelemetry?

Both, for different jobs. Sentry is best for error grouping, stack traces, and session replay. OpenTelemetry is best for distributed traces across services. Most teams under 20 engineers run Sentry plus OTel pointed at Honeycomb or Axiom, and skip a full Datadog rollout until they cross $20M ARR.

Does `instrumentation.ts` work in the edge runtime?

Yes, but with caveats. The file itself loads in both runtimes; the SDKs you import inside it may not. Always branch on process.env.NEXT_RUNTIME and import edge-safe modules conditionally. @vercel/otel supports edge for fetch instrumentation only.

Why are my Vercel logs disappearing?

Three common reasons: lines over 4KB are truncated, invocations over 256KB total are truncated, and the dashboard only retains 1 hour on Hobby tier. Set up a log drain to Axiom, Better Stack, or Datadog if you need history beyond debugging the last few minutes.

How much should observability cost a Next.js startup?

For a small app (under 1M requests/month), expect $0 to $50/month using Vercel Speed Insights plus a free-tier APM like Axiom or Honeycomb. Mid-stage ($1M to 10M requests) usually lands at $200 to $600/month across Sentry, an APM, and a log drain. Above that, vendor consolidation onto Datadog or Grafana Cloud Pro starts to pay off.

Harsh Shuddhalwar

Fullstack Developer

Fullstack developer at withRemote. Ships across the stack — TypeScript, Node, Postgres, Vercel. Writes on shipping speed and pragmatic architecture.

All posts

How to add observability to a Next.js app

How to add observability to a Next.js app

Why observability in Next.js is harder than it looks

The three pillars, mapped to Next.js primitives

Pillar 1: traces with the instrumentation.ts hook

RSC tracing

Server Action observability

Pillar 2: metrics

Pillar 3: structured logs with pino

Edge runtime gotcha

Vercel's hidden log limit

A full instrumentation.ts example

Edge vs Node runtime: what works where

Comparison of approaches

Common pitfalls

When you can skip this entirely

What to do next

FAQ

How long does it take to add observability to a Next.js app?

Should I use Sentry or OpenTelemetry?

Does instrumentation.ts work in the edge runtime?

Why are my Vercel logs disappearing?

How much should observability cost a Next.js startup?

Pillar 1: traces with the `instrumentation.ts` hook

A full `instrumentation.ts` example

Does `instrumentation.ts` work in the edge runtime?