How to use server-sent events vs WebSockets

Use Server-Sent Events (SSE) when the server needs to push updates to the client and the client rarely sends data back (AI token streaming, live dashboards, notifications, progress bars). Use WebSockets when both sides talk continuously with low latency (multiplayer, chat with typing indicators, collaborative editors, trading). SSE is one-way over plain HTTP; WebSockets are bidirectional over their own protocol after an HTTP upgrade.

That sentence covers 80% of real production decisions. The other 20% is where SSE quietly wins (HTTP/2 multiplexing, native reconnect, CDN-friendliness, no separate auth path) and where WebSockets earn their complexity (sub-50ms round-trips, binary frames, presence). This guide walks the trade-offs, the proxy and edge runtime gotchas that bite teams in production, and a decision matrix you can drop into a design doc.

Why this decision matters more in 2026

Two shifts pushed real-time transport from a niche topic to a default question on every greenfield app.

The first is LLM streaming. Every product with an AI feature now pushes tokens to the browser in real time, and almost all of them use SSE. OpenAI, Anthropic, Vercel AI SDK, LangChain streaming, the Claude web app: all SSE under the hood. If you are building anything that wraps a model, you are picking SSE whether you know it or not.

The second is HTTP/2 and HTTP/3 going mainstream at the edge. The old objection to SSE was "you only get 6 concurrent connections per origin." HTTP/2 killed that by multiplexing streams over a single TCP connection. Most modern hosting (Vercel, Cloudflare, Fly, Render) speaks HTTP/2 or HTTP/3 by default, so the connection-limit argument against SSE is mostly a 2018 talking point.

That changes the math. SSE used to be the "good enough but limited" option. Now it is often the right default unless you actually need bidirectional traffic.

The default approach (and why teams pick wrong)

The default failure mode: a team needs to push notifications, an engineer says "we need real-time, let's add a WebSocket server," and three months later they are debugging sticky sessions in their load balancer, writing custom reconnect logic, and routing auth tokens through query strings.

None of that was needed. A 40-line SSE endpoint with the browser's EventSource would have shipped in an afternoon, reconnected automatically, and worked through every corporate proxy.

The opposite mistake exists too. Teams pick SSE for chat, bolt on a POST endpoint to send messages, then want typing indicators and read receipts, and end up with a half-duplex Frankenstein that should have been a WebSocket.

Pick by traffic shape, not by hype.

SSE vs WebSockets: the decision matrix

Dimension	Server-Sent Events (SSE)	WebSockets
Direction	One-way (server to client)	Bidirectional
Protocol	Plain HTTP (`text/event-stream`)	Custom WS frames after HTTP/1.1 upgrade
Browser API	`new EventSource(url)`	`new WebSocket(url)`
Auto-reconnect	Built in. `Last-Event-ID` header on reconnect	You write it yourself
HTTP/2 multiplexing	Yes, multiple streams on one connection	No, each WS is its own TCP socket
Auth	Cookies, headers, anything HTTP does	Tokens often have to ride query string or first message
Proxy and CDN friendly	Very. It is just chunked HTTP	Mixed. Many CDNs need explicit WS support
Message format	UTF-8 text only	Text or binary frames
Server load (1k clients)	Higher (one HTTP request per client, unless multiplexed)	Lower (one socket, no per-message HTTP overhead)
Round-trip latency	Server to client only	20-50ms typical, both directions
Browser support	Every browser except IE11 (use `fetch`+streams as fallback)	Universal since 2012
Edge runtime support (Vercel, Cloudflare)	Native, well-supported	Cloudflare yes, Vercel Edge has caveats
Best for	AI streaming, dashboards, notifications, build logs	Chat, multiplayer, collab editing, trading, IoT
Worst for	Anything where the client streams back	Anything one-way (you waste a full duplex socket)
Time to ship MVP	~1 day	~3-5 days plus reconnect logic

The honest summary: WebSockets are more powerful and lower-latency. SSE is simpler, friendlier to existing HTTP infrastructure, and good enough for most read-heavy use cases.

How SSE actually works

SSE is shockingly small. The server keeps an HTTP response open, sets Content-Type: text/event-stream, and writes lines like:

data: {"token": "Hello"}

data: {"token": " world"}

Two newlines separate events. The browser's EventSource parses this stream, fires message events, and reconnects automatically if the connection drops. If you set an id: field, the browser sends it back as the Last-Event-ID header on reconnect, so your server can resume from where it left off.

A minimal Node.js handler is about 15 lines:

res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache, no-transform');
res.flushHeaders();

for await (const chunk of llmStream) {
  res.write(`data: ${JSON.stringify(chunk)}\n\n`);
}
res.end();

The client side:

const es = new EventSource('/api/stream');
es.onmessage = (e) => render(JSON.parse(e.data));

If you need auth, cookies travel automatically because it is a plain HTTP request. That single property eliminates an entire category of WebSocket pain.

How WebSockets actually work

WebSockets start as an HTTP/1.1 request with Upgrade: websocket. The server responds with 101 Switching Protocols and the TCP connection then speaks the WebSocket frame protocol: small headers, payload, optional masking, ping/pong control frames.

You usually do not write this yourself. You reach for ws on Node.js, socket.io if you want rooms and fallbacks, Phoenix Channels on Elixir, or Cloudflare Durable Objects for stateful per-room sockets at the edge.

The browser API is symmetric:

const ws = new WebSocket('wss://api.example.com/room/42');
ws.onopen = () => ws.send(JSON.stringify({ type: 'join' }));
ws.onmessage = (e) => render(JSON.parse(e.data));
ws.onclose = () => scheduleReconnect();

Two things to notice. No onerror recovery: you write reconnect with exponential backoff yourself, including stale state on reconnect. And the URL is wss://, not https://: it is a different protocol, and some firewalls treat it differently.

HTTP/2 multiplexing changes the SSE story

The classic objection to SSE was the browser's 6-connections-per-origin limit on HTTP/1.1. Open one EventSource for a dashboard, another for notifications, and a third for a long-running AI request, and you have eaten half your connection budget. Open three tabs and the app stalls.

HTTP/2 removes the limit by multiplexing every request as a stream over one TCP connection. You can run dozens of concurrent SSE streams to the same origin without exhausting anything. HTTP/3 (QUIC) extends this with better recovery from packet loss.

The catch: your entire chain has to speak HTTP/2 end-to-end. Vercel, Cloudflare, Fly, Render, and Netlify all do. Nginx does if you set listen 443 ssl http2;. Old hardware load balancers sometimes do not, and they will downgrade you to HTTP/1.1 without telling you. If you are seeing weird connection-limit behavior with SSE in production, that is the first thing to check.

Proxies, CDNs, and the buffering trap

The single most common SSE bug is buffering. Some proxies (Nginx without configuration, certain CDN setups, antivirus middleboxes) buffer the response until they see "enough" bytes or the connection closes. To the client, it looks like the stream is broken: nothing arrives, then everything arrives at once when you close the socket.

Three fixes, in order:

Send Cache-Control: no-cache, no-transform and X-Accel-Buffering: no (the Nginx-specific header). This disables buffering in most proxies.
Send a small comment line every 15-30 seconds: : ping\n\n. This keeps load balancers from idle-killing the connection (AWS ALB has a 60-second default, Cloudflare has 100 seconds free / 300 seconds paid).
Flush after every write. In Node, res.flush() if you are behind compression middleware. In Python ASGI, await send({"type": "http.response.body", ..., "more_body": True}).

WebSockets dodge the buffering problem because they are not chunked HTTP, but they hit a different one: many corporate proxies and older CDNs strip the Upgrade header and break the handshake. If you serve a B2B app to enterprise users behind locked-down networks, SSE will get through where WebSockets will not. This is genuinely a tiebreaker for some teams.

Edge runtime caveats (Vercel, Cloudflare, others)

Edge runtimes are where the SSE-vs-WebSockets choice gets sharp.

Vercel Edge Functions support SSE via ReadableStream and have published examples for AI streaming. WebSockets on Vercel are not supported in Edge or Serverless functions directly; you need a separate hosted WS server (Ably, Pusher, PartyKit, or your own Render service). For the AI Overviews and ChatGPT use cases that dominate 2026 apps, this nudges Vercel apps toward SSE almost unconditionally.

Cloudflare Workers support both. SSE is straightforward via TransformStream. WebSockets are first-class via WebSocketPair and Durable Objects, and Durable Objects make per-room state simpler than building it yourself on a long-running Node process. If you are picking a stack for a multiplayer or chat product in 2026, Cloudflare plus Durable Objects is the cleanest answer.

Render, Fly.io, Railway, AWS, and any container host run long-lived processes and support both equally well. You give up cold-start latency in exchange for not thinking about runtime constraints. For teams already using these (deploying Next.js on Render is a common pattern, see how to deploy Next.js on Render for a walkthrough), WebSockets are no harder to operate than SSE.

If you are designing fresh, this is one of many decisions worth working through up front. Our guide on how to design a serverless backend in 2026 covers the runtime trade-off in more depth.

When to use SSE

Pick SSE when:

Token streaming from an LLM. This is the canonical use case. OpenAI, Anthropic, Google Vertex all stream via SSE. Your wrapper should too.
Server-rendered dashboards updating in near real time. Stock tickers (read-only), system status, build logs, deploy progress. The client reads; it does not write.
Notifications and toasts. New comments, mentions, "your export is ready" alerts.
Progress bars for long jobs. Background renders, video transcoding, large file uploads on the server side.
You want to ride existing HTTP infrastructure. Auth, rate limiting, CORS, observability tooling, all of it works the same as your REST endpoints because SSE is just chunked HTTP.

The "what to do" version: if your endpoint name is GET /api/stream-something, you almost certainly want SSE.

If you are auditing a stack and not sure whether your real-time layer is correctly sized for the use case, run it through Ship or Skip and get an honest grade on whether you are over- or under-engineered for the load.

When to use WebSockets

Pick WebSockets when:

Multiplayer games or collaborative editors. Figma, Linear, Notion-style co-editing. Low-latency bidirectional updates are the product.
Chat with typing indicators, presence, read receipts. SSE plus POST works for basic chat. Once you want "who is online right now," you want WS.
Trading, betting, auctions. Sub-50ms round-trips matter for the business.
IoT telemetry where devices both report and receive commands. Binary frames save bandwidth, full-duplex matches the data model.
Multi-region pub/sub with sticky rooms. Cloudflare Durable Objects or hosted services like Ably and Pusher exist precisely because building this on raw infra is hard.

For these workloads, the right answer is usually "use a managed real-time service or Durable Objects, not raw ws." The complexity of operating WebSockets at scale (presence, reconnection state, multi-region) is what most teams underestimate.

Common pitfalls

A few patterns that look right but break in production.

Forgetting the keep-alive ping. Your SSE stream works in dev, deploys fine, then mysteriously drops every 60 seconds in production. That is AWS ALB or another idle-timeout middlebox killing the connection. Send : heartbeat\n\n every 20-30 seconds.

Not handling reconnect state. EventSource will reconnect automatically, but if your event stream is incremental ("add token X"), reconnecting from scratch sends every token again. Use the id: field and Last-Event-ID header to resume from a checkpoint.

Putting auth tokens in WebSocket query strings. This puts the token in server logs, proxy logs, and browser history. Use cookies (works with same-origin WS) or send the token in the first message after connect, before joining any rooms.

Compressing SSE responses. Content-Encoding: gzip plus chunked streaming makes proxies want to buffer until they have enough to decompress. Skip compression for SSE endpoints, or test very carefully.

Trusting browser idle behavior. Background tabs can throttle EventSource and WebSocket alike. If "live" matters when the tab is hidden, you need a service worker or a server push (Web Push API), not a connection.

When you can skip both

If your update cadence is 30 seconds or slower, polling with setInterval(fetch, 30000) is simpler than either option, costs less to operate, and works through every network. Real-time is a budget; spend it where the user notices.

For long-running backend jobs where the user comes back later, push the result into a queue and email or notify when done. You do not need a live connection to a tab the user closed.

Picking a transport: what to do next

The condensed decision tree:

Does the client stream data back to the server, not just send occasional commands? If yes, WebSockets.
Are you on Vercel Edge or behind a strict corporate proxy? Lean SSE.
Building chat, multiplayer, or collab? WebSockets, ideally via a managed service or Durable Objects.
Building AI streaming, dashboards, or notifications? SSE.
Updates slower than every 10 seconds? Polling.

If you are deciding this for a feature on your roadmap and want a second opinion before committing engineering time, run it through Decide for a Build / Buy / Book recommendation.

If you do want to ship the implementation but the in-house team is buried, every engineer on Cadence is AI-native by default, vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings. A mid-tier engineer ($1,000/week) can ship a production SSE endpoint with reconnect, heartbeats, and proper proxy headers inside the 48-hour free trial. A senior ($1,500/week) is the right call for a WebSocket layer with presence, rooms, and multi-region failover. Founders book in 2 minutes and we have a 12,800-engineer pool with a 27-hour median time to first commit.

Want a real-time layer shipped this week without writing the reconnect logic yourself? Audit your stack with Ship or Skip, then book a Cadence engineer with a 48-hour free trial. Weekly billing, no notice period, daily ratings.

For the related architectural decisions that often come up alongside real-time transport (state, retries, observability), see our guides on running integration tests in CI and writing a technical specification engineers actually follow.

FAQ

Are WebSockets always faster than SSE?

For round-trip latency, yes: WebSockets save the per-message HTTP overhead and run over a persistent socket. For server-to-client throughput, the difference is negligible on HTTP/2, because SSE streams ride a multiplexed connection without re-establishing TCP per message. If you do not need the client to send data back, SSE is not meaningfully slower.

Can I use SSE for chat?

Yes, with a separate POST endpoint for sending messages. This works fine for low-frequency chat (customer support, comment threads). For high-frequency chat with typing indicators, read receipts, and presence, WebSockets are a better fit because you stop maintaining two parallel channels.

Why does my SSE stream work locally but fail in production?

The top three causes, in order: a proxy or load balancer buffering the response (set X-Accel-Buffering: no and skip compression), an idle timeout killing the connection (send heartbeats every 20-30 seconds), or a downstream component speaking HTTP/1.1 only and capping you at 6 concurrent connections per origin (check end-to-end HTTP/2 support).

Does SSE work on Vercel Edge Functions?

Yes. Vercel publishes first-party examples of SSE in Edge Functions using ReadableStream, and the AI SDK from Vercel uses SSE under the hood. WebSockets are not supported in Vercel Edge or Serverless; for WS you need a separate hosted service (Ably, Pusher, PartyKit) or a long-running container on Render, Fly, or AWS.

How do I auth a WebSocket connection without putting tokens in the URL?

Best option: cookies, if the WS endpoint is same-origin. Second best: connect first, then send the token as the first message before joining any rooms or subscribing to channels. Avoid query-string tokens because they end up in proxy logs, browser history, and any monitoring that records URLs.

How many concurrent SSE connections can a single server handle?

On modern Node.js with HTTP/2, a small VM holds 10,000 to 50,000 idle SSE connections, limited by file descriptors and memory (a few KB each). The bottleneck is rarely the connection count; it is the work per message. You scale on CPU long before connections.

All posts