
To optimize Postgres queries in 2026, start with pg_stat_statements to find your slowest queries, run EXPLAIN (ANALYZE, BUFFERS) on the worst offender, then pick the right index type for the access pattern (B-tree, GIN, BRIN, GiST, or HNSW for vectors). Most "slow Postgres" tickets are missing indexes, n+1 ORM patterns, or a missing connection pooler. Postgres itself is rarely the problem.
That is the workflow. Here is how to run it.
The Postgres you tuned in 2022 is not the one you run now. Three things changed.
pg_stat_statements ships pre-loaded on Neon, Supabase, and managed RDS Postgres 17, so you do not have to argue with anyone to enable it. Postgres 18 brought planner improvements and async I/O for sequential scans.
Vector workloads moved into the same database as your transactional data. pgvector 0.8.0 added query-time filtering and faster HNSW builds, so your tuning checklist now includes ANN index parameters, not just B-trees.
Connections stopped being cheap or long-lived. Serverless workers and Edge runtimes hammer Postgres with short bursts, which breaks the old assumption that you size max_connections once and forget it. The pooler decision is now load-bearing.
Every optimization starts with knowing what is actually slow. Guessing wastes hours.
Enable the extension once:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Then ask it the right question:
SELECT
substring(query, 1, 80) AS q,
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(total_exec_time::numeric, 2) AS total_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Sort by total_exec_time, not mean_exec_time. A 5ms query called 200,000 times per minute hurts more than a 2-second report run nightly. The frequency-weighted view is what matters for user-facing latency.
After every deploy, run SELECT pg_stat_statements_reset(); to compare windows cleanly. This single query, run weekly, surfaces about 90% of optimization opportunities in a typical SaaS app.
Once you have your top offender, run it through EXPLAIN. Do it correctly:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 20;
Plain EXPLAIN shows the planner's guess. ANALYZE actually runs the query and returns real timings. BUFFERS adds shared/local cache hit counts, which tells you whether you are hitting disk.
A typical output you want to fix:
Limit (cost=0.00..1844.00 rows=20 width=128) (actual time=820.331..820.451 rows=20 loops=1)
-> Sort (cost=0.00..18440.00 rows=200000 width=128) (actual time=820.330..820.402 rows=20 loops=1)
Sort Key: created_at DESC
Sort Method: external merge Disk: 24576kB
-> Seq Scan on orders (cost=0.00..3450.00 rows=200000 width=128) (actual time=0.012..142.554 rows=200000 loops=1)
Filter: (user_id = 42)
Rows Removed by Filter: 1800000
Planning Time: 0.421 ms
Execution Time: 821.882 ms
Three red flags here. Seq Scan with Rows Removed by Filter: 1800000 means no usable index. Sort Method: external merge Disk means the sort spilled to disk because work_mem is too small for this operation. And the actual time of 820ms on a 20-row result is what your user is waiting for.
The fix is a composite index on (user_id, created_at DESC). After adding it, the plan collapses to an Index Scan returning in under 2ms.
Compare estimated rows vs actual rows on every node. If the planner thinks it will get 100 rows and gets 100,000, your statistics are stale and the planner picks the wrong join strategy. Run ANALYZE orders; after large data loads. In production, enable the auto_explain module to log any plan that exceeds, say, 500ms, so you find regressions before users do.
In 2026, Postgres ships five index types worth knowing. Picking the wrong one is the most common quiet performance bug.
| Index type | Best for | Build cost | Query speed | Disk size |
|---|---|---|---|---|
| B-tree | equality, range, ORDER BY | low | fast | baseline |
| GIN | JSONB, arrays, full-text | high | fast for contains | 2-5x B-tree |
| BRIN | ordered time-series | very low | ok on huge tables | ~1/1000 B-tree |
| GiST | geometry, ranges, PostGIS | medium | fast for geo | 1-2x B-tree |
| HNSW | vector similarity (pgvector) | high | 2-6ms | high (RAM-bound) |
| IVFFlat | vectors, faster builds | medium | 2-10ms | lower than HNSW |
A few real examples. For an events table with 500M rows ordered by created_at, a B-tree on created_at is roughly 12GB. A BRIN index on the same column is around 12MB and is fast enough for "last 24 hours" range queries. BRIN works because consecutive heap pages hold consecutive timestamps, so the index only stores min/max per block range.
For a users table where you query metadata @> '{"plan":"pro"}', you want a GIN index on the JSONB column with jsonb_path_ops:
CREATE INDEX users_metadata_gin
ON users USING gin (metadata jsonb_path_ops);
For embeddings, default to HNSW unless you have a specific reason not to:
CREATE INDEX items_embedding_hnsw
ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
The defaults (m=16, ef_construction=64, hnsw.ef_search=40) are sane for most cases. Raise ef_search at query time when you need higher recall and can afford the latency. Build HNSW indexes with CREATE INDEX CONCURRENTLY and bump maintenance_work_mem to several GB so the graph fits in memory during construction.
If you ship a Drizzle, Prisma, or Kysely schema, declare these indexes in code so they live next to the queries that use them. Our Drizzle ORM guide covers the syntax.
Most production tables have a "hot" subset that gets queried 100x more than the cold remainder. Partial indexes only cover that hot subset.
CREATE INDEX orders_active_user
ON orders (user_id, created_at DESC)
WHERE status = 'active' AND deleted_at IS NULL;
A full B-tree on (user_id, created_at) for 100M orders is around 3GB. The partial version covering 5% of rows (the active, non-deleted ones) is closer to 150MB. Smaller indexes mean more of them fit in shared_buffers, which means fewer disk reads, which means faster queries and faster writes.
Expression indexes do the same trick for computed values:
CREATE INDEX users_email_lower ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'sam@cadence.dev';
Without the expression index, that query forces a sequential scan because the planner cannot match LOWER(email) to a plain index on email. With it, you get an index scan.
The trap: you must write the query exactly as the index expression is declared. LOWER(email) = 'x' uses the index; email ILIKE 'X' does not.
Roughly half of the "slow Postgres" tickets we see at Cadence turn out to be application-level n+1 patterns, not Postgres problems. The shape: one query loads parents, then one query per parent loads children.
A list endpoint that renders 50 orders with their line items can fire 51 queries instead of 2. Each query is fast in isolation. The total round-trip kills you.
Fix it at the ORM layer:
with: { lineItems: true }. Drizzle compiles this to a single query with json_agg.prisma.order.findMany({ include: { lineItems: true } }). Watch nested includes; deep trees pull more than you expect.jsonArrayFrom from kysely/helpers/postgres to inline the children..includes(:line_items) for separate queries, .eager_load(:line_items) to force a single LEFT JOIN.dataloader so concurrent calls in the same tick batch into one IN (...) query.Add a query-count assertion in your tests. A list endpoint that goes from "expected ≤3 queries" to 51 will fail loud, before users notice. This single discipline catches more performance regressions than any database tuning.
If you want a second pair of eyes on whether your stack is actually the bottleneck, you can run Ship or Skip, our free tool that grades your current setup and points at the highest-payoff fix. Useful before you spend a sprint on indexes that may not move the needle.
The planner makes decisions from statistics. Bad stats produce bad plans no matter how many indexes you add.
default_statistics_target defaults to 100. For columns with skewed distributions (think tenant_id where 5% of tenants account for 80% of rows), bump it per column:
ALTER TABLE events ALTER COLUMN tenant_id SET STATISTICS 1000;
ANALYZE events;
This tells the planner to keep 1000 most-common-values samples instead of 100, which produces dramatically better cardinality estimates for joins.
work_mem is per sort, per hash, per merge, per query. The math gets ugly fast. A work_mem of 64MB on a pool of 200 connections, each running 4 parallel sorts, can request 51GB of RAM at peak. The conservative default (4MB) exists for a reason. Raise it carefully, ideally per-session for analytical queries, not globally.
JSONB access deserves its own discipline. If you query the same key on every request, do not GIN-index the whole document; add an expression index on that one key:
CREATE INDEX events_user_id_in_payload
ON events ((payload->>'user_id'));
Or, better yet, materialize the hot keys into typed columns and treat JSONB as the cold tail. That is the pattern most mature Postgres shops settle on by year three.
Each Postgres backend process eats around 10MB of RAM. Two hundred mostly-idle clients is 2GB gone before you serve a request. A pooler multiplexes thousands of client connections onto a small pool of server connections.
The 2026 lineup:
SET LOCAL. Worth it for the failover behavior on Aurora, often not worth it on plain RDS.If you are on Supabase, Supavisor is already there; just point your app at the pooler URL. If you are on Neon, the platform pool runs in front of your branches automatically. Compare hosted options in our Neon Postgres review, our Supabase review, and the broader best Postgres hosting roundup.
One subtle gotcha: transaction-mode pooling breaks prepared statements and session-level features (SET, LISTEN, advisory locks). Most modern ORMs handle this; older drivers do not. Test before you switch.
Read replicas are over-prescribed. They help when read CPU on the primary is the bottleneck. They do nothing for slow writes, and they introduce replication lag that breaks read-after-write UX.
Before adding one, check pg_stat_statements for the read/write split. If reads are 95% of total time and the primary is CPU-bound, a replica helps. If writes are the bottleneck, a replica makes things worse.
When you route to a replica, route by query, not by user. Async lag on Neon, Supabase, and RDS sits in the 50-500ms p99 range, enough to confuse a user who just posted a comment. The simplest pattern: writes to primary, reads from replica, but pin a user's own reads to the primary for 5 seconds after their last write.
A short list of things that feel like optimizations but are not.
pg_stat_user_indexes and drop the zero-scan ones.ACCESS EXCLUSIVE lock. Use pg_repack for online rewrites.ORDER BY x LIMIT 1 without a matching index. Forces a full sort to return one row.SELECT * everywhere. Pulls TOAST columns you do not need, blowing up network and memory cost.pg_dump against the primary at 9am. Long-running transactions block VACUUM, which causes bloat. Dump from a replica or off-peak.Best practices have an ROI curve. Respect it.
If you are pre-product-market-fit with a 200MB database and 50 daily users, do not pre-tune. Ship the product, watch pg_stat_statements weekly, and fix the top three queries when latency starts mattering. The optimization work below 1GB of data is almost always free; the planner picks reasonable plans for tiny tables regardless of what you do.
If your single-tenant SaaS has fewer than 100 concurrent users, you do not need a pooler. The defaults work. Adding PgBouncer adds an operational surface area that costs more than it saves at that scale.
If you are not on Postgres 16 or higher, the single best optimization you can ship this week is a major version upgrade. Each major release brings 5-15% planner gains for free.
When complexity does start mattering, our scale MVP to production playbook covers the operational moves that ride alongside Postgres tuning: error tracking, background jobs, alerting.
If you would rather hand the audit to someone who has done it fifty times, every Cadence engineer is AI-native by default (Cursor, Claude Code, Copilot in daily use, vetted on a voice interview before they unlock bookings). A senior at $1,500/week typically lands a Postgres tuning audit and the top fixes inside one weekly sprint.
Try it. Audit your stack with Ship or Skip for an honest grade, or book a senior on the Cadence founder onboarding flow and try them for 48 hours free. Weekly billing, no notice period, replace any week.
Enable pg_stat_statements and query it ordered by total_exec_time DESC. That single query surfaces about 90% of optimization opportunities. Sort by total time, not mean time, so frequency-weighted offenders surface first.
EXPLAIN shows the planner's estimated plan without running the query. EXPLAIN ANALYZE actually executes it and returns real timings, row counts, and (with BUFFERS) cache hit ratios. Always use ANALYZE for tuning. Be careful with INSERT, UPDATE, or DELETE: ANALYZE runs the mutation for real.
When most queries filter on the same WHERE clause and only a small fraction of rows match. Common cases: status = 'active', deleted_at IS NULL, tenant_id = $1. Partial indexes shrink disk size and speed up writes, sometimes by 10x.
No. Supabase ships Supavisor and Neon runs a pooler in front of every branch by default. Connect to the pooler URL your provider gives you. Run your own PgBouncer only if you want transaction-mode pooling at the application edge (rare).
Default to HNSW for production search where query latency matters more than build time. Pick IVFFlat when you rebuild the index frequently or memory pressure is high. Both ship with pgvector; both are good. HNSW is the better default in 2026.