
Use Server-Sent Events (SSE) when the server needs to push updates to the client and the client rarely sends data back (AI token streaming, live dashboards, notifications, progress bars). Use WebSockets when both sides talk continuously with low latency (multiplayer, chat with typing indicators, collaborative editors, trading). SSE is one-way over plain HTTP; WebSockets are bidirectional over their own protocol after an HTTP upgrade.
That sentence covers 80% of real production decisions. The other 20% is where SSE quietly wins (HTTP/2 multiplexing, native reconnect, CDN-friendliness, no separate auth path) and where WebSockets earn their complexity (sub-50ms round-trips, binary frames, presence). This guide walks the trade-offs, the proxy and edge runtime gotchas that bite teams in production, and a decision matrix you can drop into a design doc.
Two shifts pushed real-time transport from a niche topic to a default question on every greenfield app.
The first is LLM streaming. Every product with an AI feature now pushes tokens to the browser in real time, and almost all of them use SSE. OpenAI, Anthropic, Vercel AI SDK, LangChain streaming, the Claude web app: all SSE under the hood. If you are building anything that wraps a model, you are picking SSE whether you know it or not.
The second is HTTP/2 and HTTP/3 going mainstream at the edge. The old objection to SSE was "you only get 6 concurrent connections per origin." HTTP/2 killed that by multiplexing streams over a single TCP connection. Most modern hosting (Vercel, Cloudflare, Fly, Render) speaks HTTP/2 or HTTP/3 by default, so the connection-limit argument against SSE is mostly a 2018 talking point.
That changes the math. SSE used to be the "good enough but limited" option. Now it is often the right default unless you actually need bidirectional traffic.
The default failure mode: a team needs to push notifications, an engineer says "we need real-time, let's add a WebSocket server," and three months later they are debugging sticky sessions in their load balancer, writing custom reconnect logic, and routing auth tokens through query strings.
None of that was needed. A 40-line SSE endpoint with the browser's EventSource would have shipped in an afternoon, reconnected automatically, and worked through every corporate proxy.
The opposite mistake exists too. Teams pick SSE for chat, bolt on a POST endpoint to send messages, then want typing indicators and read receipts, and end up with a half-duplex Frankenstein that should have been a WebSocket.
Pick by traffic shape, not by hype.
| Dimension | Server-Sent Events (SSE) | WebSockets |
|---|---|---|
| Direction | One-way (server to client) | Bidirectional |
| Protocol | Plain HTTP (text/event-stream) | Custom WS frames after HTTP/1.1 upgrade |
| Browser API | new EventSource(url) | new WebSocket(url) |
| Auto-reconnect | Built in. Last-Event-ID header on reconnect | You write it yourself |
| HTTP/2 multiplexing | Yes, multiple streams on one connection | No, each WS is its own TCP socket |
| Auth | Cookies, headers, anything HTTP does | Tokens often have to ride query string or first message |
| Proxy and CDN friendly | Very. It is just chunked HTTP | Mixed. Many CDNs need explicit WS support |
| Message format | UTF-8 text only | Text or binary frames |
| Server load (1k clients) | Higher (one HTTP request per client, unless multiplexed) | Lower (one socket, no per-message HTTP overhead) |
| Round-trip latency | Server to client only | 20-50ms typical, both directions |
| Browser support | Every browser except IE11 (use fetch+streams as fallback) | Universal since 2012 |
| Edge runtime support (Vercel, Cloudflare) | Native, well-supported | Cloudflare yes, Vercel Edge has caveats |
| Best for | AI streaming, dashboards, notifications, build logs | Chat, multiplayer, collab editing, trading, IoT |
| Worst for | Anything where the client streams back | Anything one-way (you waste a full duplex socket) |
| Time to ship MVP | ~1 day | ~3-5 days plus reconnect logic |
The honest summary: WebSockets are more powerful and lower-latency. SSE is simpler, friendlier to existing HTTP infrastructure, and good enough for most read-heavy use cases.
SSE is shockingly small. The server keeps an HTTP response open, sets Content-Type: text/event-stream, and writes lines like:
data: {"token": "Hello"}
data: {"token": " world"}
Two newlines separate events. The browser's EventSource parses this stream, fires message events, and reconnects automatically if the connection drops. If you set an id: field, the browser sends it back as the Last-Event-ID header on reconnect, so your server can resume from where it left off.
A minimal Node.js handler is about 15 lines:
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache, no-transform');
res.flushHeaders();
for await (const chunk of llmStream) {
res.write(`data: ${JSON.stringify(chunk)}\n\n`);
}
res.end();
The client side:
const es = new EventSource('/api/stream');
es.onmessage = (e) => render(JSON.parse(e.data));
If you need auth, cookies travel automatically because it is a plain HTTP request. That single property eliminates an entire category of WebSocket pain.
WebSockets start as an HTTP/1.1 request with Upgrade: websocket. The server responds with 101 Switching Protocols and the TCP connection then speaks the WebSocket frame protocol: small headers, payload, optional masking, ping/pong control frames.
You usually do not write this yourself. You reach for ws on Node.js, socket.io if you want rooms and fallbacks, Phoenix Channels on Elixir, or Cloudflare Durable Objects for stateful per-room sockets at the edge.
The browser API is symmetric:
const ws = new WebSocket('wss://api.example.com/room/42');
ws.onopen = () => ws.send(JSON.stringify({ type: 'join' }));
ws.onmessage = (e) => render(JSON.parse(e.data));
ws.onclose = () => scheduleReconnect();
Two things to notice. No onerror recovery: you write reconnect with exponential backoff yourself, including stale state on reconnect. And the URL is wss://, not https://: it is a different protocol, and some firewalls treat it differently.
The classic objection to SSE was the browser's 6-connections-per-origin limit on HTTP/1.1. Open one EventSource for a dashboard, another for notifications, and a third for a long-running AI request, and you have eaten half your connection budget. Open three tabs and the app stalls.
HTTP/2 removes the limit by multiplexing every request as a stream over one TCP connection. You can run dozens of concurrent SSE streams to the same origin without exhausting anything. HTTP/3 (QUIC) extends this with better recovery from packet loss.
The catch: your entire chain has to speak HTTP/2 end-to-end. Vercel, Cloudflare, Fly, Render, and Netlify all do. Nginx does if you set listen 443 ssl http2;. Old hardware load balancers sometimes do not, and they will downgrade you to HTTP/1.1 without telling you. If you are seeing weird connection-limit behavior with SSE in production, that is the first thing to check.
The single most common SSE bug is buffering. Some proxies (Nginx without configuration, certain CDN setups, antivirus middleboxes) buffer the response until they see "enough" bytes or the connection closes. To the client, it looks like the stream is broken: nothing arrives, then everything arrives at once when you close the socket.
Three fixes, in order:
Cache-Control: no-cache, no-transform and X-Accel-Buffering: no (the Nginx-specific header). This disables buffering in most proxies.: ping\n\n. This keeps load balancers from idle-killing the connection (AWS ALB has a 60-second default, Cloudflare has 100 seconds free / 300 seconds paid).res.flush() if you are behind compression middleware. In Python ASGI, await send({"type": "http.response.body", ..., "more_body": True}).WebSockets dodge the buffering problem because they are not chunked HTTP, but they hit a different one: many corporate proxies and older CDNs strip the Upgrade header and break the handshake. If you serve a B2B app to enterprise users behind locked-down networks, SSE will get through where WebSockets will not. This is genuinely a tiebreaker for some teams.
Edge runtimes are where the SSE-vs-WebSockets choice gets sharp.
Vercel Edge Functions support SSE via ReadableStream and have published examples for AI streaming. WebSockets on Vercel are not supported in Edge or Serverless functions directly; you need a separate hosted WS server (Ably, Pusher, PartyKit, or your own Render service). For the AI Overviews and ChatGPT use cases that dominate 2026 apps, this nudges Vercel apps toward SSE almost unconditionally.
Cloudflare Workers support both. SSE is straightforward via TransformStream. WebSockets are first-class via WebSocketPair and Durable Objects, and Durable Objects make per-room state simpler than building it yourself on a long-running Node process. If you are picking a stack for a multiplayer or chat product in 2026, Cloudflare plus Durable Objects is the cleanest answer.
Render, Fly.io, Railway, AWS, and any container host run long-lived processes and support both equally well. You give up cold-start latency in exchange for not thinking about runtime constraints. For teams already using these (deploying Next.js on Render is a common pattern, see how to deploy Next.js on Render for a walkthrough), WebSockets are no harder to operate than SSE.
If you are designing fresh, this is one of many decisions worth working through up front. Our guide on how to design a serverless backend in 2026 covers the runtime trade-off in more depth.
Pick SSE when:
The "what to do" version: if your endpoint name is GET /api/stream-something, you almost certainly want SSE.
If you are auditing a stack and not sure whether your real-time layer is correctly sized for the use case, run it through Ship or Skip and get an honest grade on whether you are over- or under-engineered for the load.
Pick WebSockets when:
For these workloads, the right answer is usually "use a managed real-time service or Durable Objects, not raw ws." The complexity of operating WebSockets at scale (presence, reconnection state, multi-region) is what most teams underestimate.
A few patterns that look right but break in production.
Forgetting the keep-alive ping. Your SSE stream works in dev, deploys fine, then mysteriously drops every 60 seconds in production. That is AWS ALB or another idle-timeout middlebox killing the connection. Send : heartbeat\n\n every 20-30 seconds.
Not handling reconnect state. EventSource will reconnect automatically, but if your event stream is incremental ("add token X"), reconnecting from scratch sends every token again. Use the id: field and Last-Event-ID header to resume from a checkpoint.
Putting auth tokens in WebSocket query strings. This puts the token in server logs, proxy logs, and browser history. Use cookies (works with same-origin WS) or send the token in the first message after connect, before joining any rooms.
Compressing SSE responses. Content-Encoding: gzip plus chunked streaming makes proxies want to buffer until they have enough to decompress. Skip compression for SSE endpoints, or test very carefully.
Trusting browser idle behavior. Background tabs can throttle EventSource and WebSocket alike. If "live" matters when the tab is hidden, you need a service worker or a server push (Web Push API), not a connection.
If your update cadence is 30 seconds or slower, polling with setInterval(fetch, 30000) is simpler than either option, costs less to operate, and works through every network. Real-time is a budget; spend it where the user notices.
For long-running backend jobs where the user comes back later, push the result into a queue and email or notify when done. You do not need a live connection to a tab the user closed.
The condensed decision tree:
If you are deciding this for a feature on your roadmap and want a second opinion before committing engineering time, run it through Decide for a Build / Buy / Book recommendation.
If you do want to ship the implementation but the in-house team is buried, every engineer on Cadence is AI-native by default, vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings. A mid-tier engineer ($1,000/week) can ship a production SSE endpoint with reconnect, heartbeats, and proper proxy headers inside the 48-hour free trial. A senior ($1,500/week) is the right call for a WebSocket layer with presence, rooms, and multi-region failover. Founders book in 2 minutes and we have a 12,800-engineer pool with a 27-hour median time to first commit.
Want a real-time layer shipped this week without writing the reconnect logic yourself? Audit your stack with Ship or Skip, then book a Cadence engineer with a 48-hour free trial. Weekly billing, no notice period, daily ratings.
For the related architectural decisions that often come up alongside real-time transport (state, retries, observability), see our guides on running integration tests in CI and writing a technical specification engineers actually follow.
For round-trip latency, yes: WebSockets save the per-message HTTP overhead and run over a persistent socket. For server-to-client throughput, the difference is negligible on HTTP/2, because SSE streams ride a multiplexed connection without re-establishing TCP per message. If you do not need the client to send data back, SSE is not meaningfully slower.
Yes, with a separate POST endpoint for sending messages. This works fine for low-frequency chat (customer support, comment threads). For high-frequency chat with typing indicators, read receipts, and presence, WebSockets are a better fit because you stop maintaining two parallel channels.
The top three causes, in order: a proxy or load balancer buffering the response (set X-Accel-Buffering: no and skip compression), an idle timeout killing the connection (send heartbeats every 20-30 seconds), or a downstream component speaking HTTP/1.1 only and capping you at 6 concurrent connections per origin (check end-to-end HTTP/2 support).
Yes. Vercel publishes first-party examples of SSE in Edge Functions using ReadableStream, and the AI SDK from Vercel uses SSE under the hood. WebSockets are not supported in Vercel Edge or Serverless; for WS you need a separate hosted service (Ably, Pusher, PartyKit) or a long-running container on Render, Fly, or AWS.
Best option: cookies, if the WS endpoint is same-origin. Second best: connect first, then send the token as the first message before joining any rooms or subscribing to channels. Avoid query-string tokens because they end up in proxy logs, browser history, and any monitoring that records URLs.
On modern Node.js with HTTP/2, a small VM holds 10,000 to 50,000 idle SSE connections, limited by file descriptors and memory (a few KB each). The bottleneck is rarely the connection count; it is the work per message. You scale on CPU long before connections.