I am a...
Learn more
How it worksPricingFAQ
Account
May 8, 2026 · 11 min read · Cadence Editorial

How to optimize Postgres queries in 2026

optimize postgres queries — How to optimize Postgres queries in 2026
Photo by [panumas nikhomkhai](https://www.pexels.com/@cookiecutter) on [Pexels](https://www.pexels.com/photo/line-of-pc-towers-17489151/)

How to optimize Postgres queries in 2026

To optimize Postgres queries in 2026, start with pg_stat_statements to find your slowest queries, run EXPLAIN (ANALYZE, BUFFERS) on the worst offender, then pick the right index type for the access pattern (B-tree, GIN, BRIN, GiST, or HNSW for vectors). Most "slow Postgres" tickets are missing indexes, n+1 ORM patterns, or a missing connection pooler. Postgres itself is rarely the problem.

That is the workflow. Here is how to run it.

Why Postgres optimization looks different in 2026

The Postgres you tuned in 2022 is not the one you run now. Three things changed.

pg_stat_statements ships pre-loaded on Neon, Supabase, and managed RDS Postgres 17, so you do not have to argue with anyone to enable it. Postgres 18 brought planner improvements and async I/O for sequential scans.

Vector workloads moved into the same database as your transactional data. pgvector 0.8.0 added query-time filtering and faster HNSW builds, so your tuning checklist now includes ANN index parameters, not just B-trees.

Connections stopped being cheap or long-lived. Serverless workers and Edge runtimes hammer Postgres with short bursts, which breaks the old assumption that you size max_connections once and forget it. The pooler decision is now load-bearing.

Step 1: Find the slow queries with pg_stat_statements

Every optimization starts with knowing what is actually slow. Guessing wastes hours.

Enable the extension once:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Then ask it the right question:

SELECT
  substring(query, 1, 80) AS q,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round(total_exec_time::numeric, 2) AS total_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Sort by total_exec_time, not mean_exec_time. A 5ms query called 200,000 times per minute hurts more than a 2-second report run nightly. The frequency-weighted view is what matters for user-facing latency.

After every deploy, run SELECT pg_stat_statements_reset(); to compare windows cleanly. This single query, run weekly, surfaces about 90% of optimization opportunities in a typical SaaS app.

Step 2: Read EXPLAIN ANALYZE the right way

Once you have your top offender, run it through EXPLAIN. Do it correctly:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 20;

Plain EXPLAIN shows the planner's guess. ANALYZE actually runs the query and returns real timings. BUFFERS adds shared/local cache hit counts, which tells you whether you are hitting disk.

A typical output you want to fix:

Limit  (cost=0.00..1844.00 rows=20 width=128) (actual time=820.331..820.451 rows=20 loops=1)
  ->  Sort  (cost=0.00..18440.00 rows=200000 width=128) (actual time=820.330..820.402 rows=20 loops=1)
        Sort Key: created_at DESC
        Sort Method: external merge  Disk: 24576kB
        ->  Seq Scan on orders  (cost=0.00..3450.00 rows=200000 width=128) (actual time=0.012..142.554 rows=200000 loops=1)
              Filter: (user_id = 42)
              Rows Removed by Filter: 1800000
Planning Time: 0.421 ms
Execution Time: 821.882 ms

Three red flags here. Seq Scan with Rows Removed by Filter: 1800000 means no usable index. Sort Method: external merge Disk means the sort spilled to disk because work_mem is too small for this operation. And the actual time of 820ms on a 20-row result is what your user is waiting for.

The fix is a composite index on (user_id, created_at DESC). After adding it, the plan collapses to an Index Scan returning in under 2ms.

Compare estimated rows vs actual rows on every node. If the planner thinks it will get 100 rows and gets 100,000, your statistics are stale and the planner picks the wrong join strategy. Run ANALYZE orders; after large data loads. In production, enable the auto_explain module to log any plan that exceeds, say, 500ms, so you find regressions before users do.

Step 3: Pick the right index type

In 2026, Postgres ships five index types worth knowing. Picking the wrong one is the most common quiet performance bug.

Index typeBest forBuild costQuery speedDisk size
B-treeequality, range, ORDER BYlowfastbaseline
GINJSONB, arrays, full-texthighfast for contains2-5x B-tree
BRINordered time-seriesvery lowok on huge tables~1/1000 B-tree
GiSTgeometry, ranges, PostGISmediumfast for geo1-2x B-tree
HNSWvector similarity (pgvector)high2-6mshigh (RAM-bound)
IVFFlatvectors, faster buildsmedium2-10mslower than HNSW

A few real examples. For an events table with 500M rows ordered by created_at, a B-tree on created_at is roughly 12GB. A BRIN index on the same column is around 12MB and is fast enough for "last 24 hours" range queries. BRIN works because consecutive heap pages hold consecutive timestamps, so the index only stores min/max per block range.

For a users table where you query metadata @> '{"plan":"pro"}', you want a GIN index on the JSONB column with jsonb_path_ops:

CREATE INDEX users_metadata_gin
  ON users USING gin (metadata jsonb_path_ops);

For embeddings, default to HNSW unless you have a specific reason not to:

CREATE INDEX items_embedding_hnsw
  ON items USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

The defaults (m=16, ef_construction=64, hnsw.ef_search=40) are sane for most cases. Raise ef_search at query time when you need higher recall and can afford the latency. Build HNSW indexes with CREATE INDEX CONCURRENTLY and bump maintenance_work_mem to several GB so the graph fits in memory during construction.

If you ship a Drizzle, Prisma, or Kysely schema, declare these indexes in code so they live next to the queries that use them. Our Drizzle ORM guide covers the syntax.

Step 4: Use partial and expression indexes for filtered queries

Most production tables have a "hot" subset that gets queried 100x more than the cold remainder. Partial indexes only cover that hot subset.

CREATE INDEX orders_active_user
  ON orders (user_id, created_at DESC)
  WHERE status = 'active' AND deleted_at IS NULL;

A full B-tree on (user_id, created_at) for 100M orders is around 3GB. The partial version covering 5% of rows (the active, non-deleted ones) is closer to 150MB. Smaller indexes mean more of them fit in shared_buffers, which means fewer disk reads, which means faster queries and faster writes.

Expression indexes do the same trick for computed values:

CREATE INDEX users_email_lower ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'sam@cadence.dev';

Without the expression index, that query forces a sequential scan because the planner cannot match LOWER(email) to a plain index on email. With it, you get an index scan.

The trap: you must write the query exactly as the index expression is declared. LOWER(email) = 'x' uses the index; email ILIKE 'X' does not.

Step 5: Fix the n+1 query trap (it is usually the ORM)

Roughly half of the "slow Postgres" tickets we see at Cadence turn out to be application-level n+1 patterns, not Postgres problems. The shape: one query loads parents, then one query per parent loads children.

A list endpoint that renders 50 orders with their line items can fire 51 queries instead of 2. Each query is fast in isolation. The total round-trip kills you.

Fix it at the ORM layer:

  • Drizzle: use relational queries with with: { lineItems: true }. Drizzle compiles this to a single query with json_agg.
  • Prisma: prisma.order.findMany({ include: { lineItems: true } }). Watch nested includes; deep trees pull more than you expect.
  • Kysely: use jsonArrayFrom from kysely/helpers/postgres to inline the children.
  • ActiveRecord: .includes(:line_items) for separate queries, .eager_load(:line_items) to force a single LEFT JOIN.
  • Generic Node: wrap the child fetcher in dataloader so concurrent calls in the same tick batch into one IN (...) query.

Add a query-count assertion in your tests. A list endpoint that goes from "expected ≤3 queries" to 51 will fail loud, before users notice. This single discipline catches more performance regressions than any database tuning.

If you want a second pair of eyes on whether your stack is actually the bottleneck, you can run Ship or Skip, our free tool that grades your current setup and points at the highest-payoff fix. Useful before you spend a sprint on indexes that may not move the needle.

Step 6: Tune statistics, work_mem, and JSONB access

The planner makes decisions from statistics. Bad stats produce bad plans no matter how many indexes you add.

default_statistics_target defaults to 100. For columns with skewed distributions (think tenant_id where 5% of tenants account for 80% of rows), bump it per column:

ALTER TABLE events ALTER COLUMN tenant_id SET STATISTICS 1000;
ANALYZE events;

This tells the planner to keep 1000 most-common-values samples instead of 100, which produces dramatically better cardinality estimates for joins.

work_mem is per sort, per hash, per merge, per query. The math gets ugly fast. A work_mem of 64MB on a pool of 200 connections, each running 4 parallel sorts, can request 51GB of RAM at peak. The conservative default (4MB) exists for a reason. Raise it carefully, ideally per-session for analytical queries, not globally.

JSONB access deserves its own discipline. If you query the same key on every request, do not GIN-index the whole document; add an expression index on that one key:

CREATE INDEX events_user_id_in_payload
  ON events ((payload->>'user_id'));

Or, better yet, materialize the hot keys into typed columns and treat JSONB as the cold tail. That is the pattern most mature Postgres shops settle on by year three.

Step 7: Pool connections (PgBouncer, Supavisor, RDS Proxy)

Each Postgres backend process eats around 10MB of RAM. Two hundred mostly-idle clients is 2GB gone before you serve a request. A pooler multiplexes thousands of client connections onto a small pool of server connections.

The 2026 lineup:

  • PgBouncer. The battle-tested standard. Single binary, ~2MB RAM per 1000 clients, transaction-mode pooling, single-threaded so you cap out at one CPU core per instance. Run multiple instances behind HAProxy if you need more throughput.
  • Supavisor. Built by Supabase in Elixir. Multi-threaded, designed for serverless and Edge. Handles millions of connections by design. The right pick if you ship from Vercel, Cloudflare Workers, or Lambda.
  • RDS Proxy. AWS-managed, IAM auth, Aurora-aware, automatic failover handling. Real production note: it adds roughly 30% latency overhead vs raw Postgres and has session-pinning quirks with SET LOCAL. Worth it for the failover behavior on Aurora, often not worth it on plain RDS.
  • PgCat. Newer, multi-threaded, supports read/write splitting and sharding at the pooler. Compelling if you have outgrown PgBouncer but do not want to rebuild your app for sharding.

If you are on Supabase, Supavisor is already there; just point your app at the pooler URL. If you are on Neon, the platform pool runs in front of your branches automatically. Compare hosted options in our Neon Postgres review, our Supabase review, and the broader best Postgres hosting roundup.

One subtle gotcha: transaction-mode pooling breaks prepared statements and session-level features (SET, LISTEN, advisory locks). Most modern ORMs handle this; older drivers do not. Test before you switch.

Step 8: Decide whether you actually need a read replica

Read replicas are over-prescribed. They help when read CPU on the primary is the bottleneck. They do nothing for slow writes, and they introduce replication lag that breaks read-after-write UX.

Before adding one, check pg_stat_statements for the read/write split. If reads are 95% of total time and the primary is CPU-bound, a replica helps. If writes are the bottleneck, a replica makes things worse.

When you route to a replica, route by query, not by user. Async lag on Neon, Supabase, and RDS sits in the 50-500ms p99 range, enough to confuse a user who just posted a comment. The simplest pattern: writes to primary, reads from replica, but pin a user's own reads to the primary for 5 seconds after their last write.

Common pitfalls that look like wins

A short list of things that feel like optimizations but are not.

  • Adding indexes everywhere. Each index slows every write and bloats disk. Audit unused indexes with pg_stat_user_indexes and drop the zero-scan ones.
  • VACUUM FULL in production. It rewrites the table and takes an ACCESS EXCLUSIVE lock. Use pg_repack for online rewrites.
  • ORDER BY x LIMIT 1 without a matching index. Forces a full sort to return one row.
  • SELECT * everywhere. Pulls TOAST columns you do not need, blowing up network and memory cost.
  • pg_dump against the primary at 9am. Long-running transactions block VACUUM, which causes bloat. Dump from a replica or off-peak.

When you can skip most of this

Best practices have an ROI curve. Respect it.

If you are pre-product-market-fit with a 200MB database and 50 daily users, do not pre-tune. Ship the product, watch pg_stat_statements weekly, and fix the top three queries when latency starts mattering. The optimization work below 1GB of data is almost always free; the planner picks reasonable plans for tiny tables regardless of what you do.

If your single-tenant SaaS has fewer than 100 concurrent users, you do not need a pooler. The defaults work. Adding PgBouncer adds an operational surface area that costs more than it saves at that scale.

If you are not on Postgres 16 or higher, the single best optimization you can ship this week is a major version upgrade. Each major release brings 5-15% planner gains for free.

When complexity does start mattering, our scale MVP to production playbook covers the operational moves that ride alongside Postgres tuning: error tracking, background jobs, alerting.

If you would rather hand the audit to someone who has done it fifty times, every Cadence engineer is AI-native by default (Cursor, Claude Code, Copilot in daily use, vetted on a voice interview before they unlock bookings). A senior at $1,500/week typically lands a Postgres tuning audit and the top fixes inside one weekly sprint.

Try it. Audit your stack with Ship or Skip for an honest grade, or book a senior on the Cadence founder onboarding flow and try them for 48 hours free. Weekly billing, no notice period, replace any week.

FAQ

How do I find which Postgres queries are slow?

Enable pg_stat_statements and query it ordered by total_exec_time DESC. That single query surfaces about 90% of optimization opportunities. Sort by total time, not mean time, so frequency-weighted offenders surface first.

What is the difference between EXPLAIN and EXPLAIN ANALYZE?

EXPLAIN shows the planner's estimated plan without running the query. EXPLAIN ANALYZE actually executes it and returns real timings, row counts, and (with BUFFERS) cache hit ratios. Always use ANALYZE for tuning. Be careful with INSERT, UPDATE, or DELETE: ANALYZE runs the mutation for real.

When should I use a partial index?

When most queries filter on the same WHERE clause and only a small fraction of rows match. Common cases: status = 'active', deleted_at IS NULL, tenant_id = $1. Partial indexes shrink disk size and speed up writes, sometimes by 10x.

Do I need PgBouncer if I am on Supabase or Neon?

No. Supabase ships Supavisor and Neon runs a pooler in front of every branch by default. Connect to the pooler URL your provider gives you. Run your own PgBouncer only if you want transaction-mode pooling at the application edge (rare).

Should I use HNSW or IVFFlat for pgvector?

Default to HNSW for production search where query latency matters more than build time. Pick IVFFlat when you rebuild the index frequently or memory pressure is high. Both ship with pgvector; both are good. HNSW is the better default in 2026.

All posts