
The best vector database for production in 2026 is pgvector if you are on Postgres under 50M vectors, Turbopuffer if you can tolerate cold queries above 10M vectors, and Pinecone if you want zero ops. Everything else is a niche pick. The real decision is not "which engine wins benchmarks" but "what does this look like at 3am."
Most vector database posts grade engines like a benchmark sheet. That misses the point. Production cost is dominated by three things the benchmark tables hide: who operates the cluster on call, how filtered queries behave at p99, and whether you actually need sub-50ms latency. Here is the short version.
| Scale | Workload | Pick | Why |
|---|---|---|---|
| Under 1M | Anything | pgvector | One database, zero ops, ~$30/month |
| 1M to 10M | Latency-sensitive RAG | pgvector + pgvectorscale | 471 QPS at 99% recall on 50M, still in Postgres |
| 1M to 10M | Managed only | Pinecone Serverless | ~$70/month, no ops, predictable |
| 10M to 100M | Cold-tolerant search | Turbopuffer | $70/TB storage, warm p90 of 10ms |
| 10M to 100M | Hot real-time agents | Qdrant Cloud | p50 of 4ms, predictable filtering |
| 100M+ | Self-hosted, ops capacity | Milvus | Linear scale, but you pay an operator |
| Prototypes | Local dev | Chroma | Fastest to import, do not ship it |
If you take nothing else from this post, take the table. The rest is the reasoning behind it.
The word "production" gets thrown around. For vector databases, it means something specific.
You have a corpus that grows weekly. You have filtered metadata queries (user_id, tenant_id, date_range) that need to return correct results, not just close ones. You have hybrid search because pure semantic search misses exact-match queries like "invoice 4471." You have an SLO. You have a 3am pager.
A tool that ships a clean demo with 100k vectors and unfiltered queries is a prototype. The transition to production breaks most of them in one of three places: filtered recall collapses, p99 latency triples under load, or the cost line item gets escalated by your CFO. The picks below are the ones we have seen survive all three.
If you are already running Postgres, you should start here. pgvector hits 8 to 25ms p95 under 10M vectors on a $50/month RDS instance. With pgvectorscale (Tiger Data's extension), it hits 471 QPS at 99% recall on 50M vectors. That covers 80% of real RAG workloads.
The real win is not speed. It is that your vectors live next to your users table. Filtered queries (WHERE tenant_id = $1 AND embedding <-> $2) are a single SQL statement, not a two-stage retrieval problem. Backups, replication, point-in-time recovery, audit logs: you already have them.
Where it breaks: above 50M vectors with high write throughput, HNSW index builds get expensive and VACUUM becomes a problem. If your corpus is approaching 100M and growing, plan the migration before you need it.
Turbopuffer is the most interesting vector database of the last two years because it changes the cost ceiling. Object-storage-first means $70/TB/month versus the $1,600/TB/month incumbent stacks (RAM plus 3x SSD replication). At 100M vectors, that is the difference between a $700/month Pinecone bill and a sub-$100 Turbopuffer bill.
The trade-off is explicit. Cold queries on a 1M-vector namespace hit 444ms p90 (the document has not been touched recently and lives only on S3). Warm queries hit 10ms p90. You either accept that some queries take half a second, or you build a warming strategy.
Use it for: large doc corpora, infrequent-access search, multi-tenant SaaS where most tenants are quiet. Skip it for: real-time agent retrieval where every query must be fast.
Pinecone Serverless is what you pick when you do not want to think about it. Latency is a consistent ~8ms p50, scaling is automatic, and the developer experience is the cleanest in the category. It is not the cheapest and it is not the fastest, but it is the most predictable.
The cost honesty: at 10M vectors, expect ~$70/month. At 100M vectors with active reads, expect $700+/month. This is fine if your retrieval cost is a small line item. It becomes a problem the moment a CFO asks why the bill ten-x'd.
Pinecone also caps at 20 indexes on standard plans (with up to 100,000 namespaces). For multi-tenant SaaS, this matters: you almost always want one index, many namespaces, not the reverse.
Qdrant delivers p50 of 4ms and p99 of 25ms on standard benchmarks, the lowest among purpose-built vector DBs. The filtering implementation is genuinely better than the others: payload indexes are first-class, so filtered queries do not blow up recall the way they do on naive HNSW.
Qdrant Cloud at 10M vectors runs ~$65/month. Self-hosted is straightforward if you are comfortable with a single Rust binary plus a persistent volume. Use Qdrant when you have an agentic workload with strict latency SLOs and meaningful metadata filtering. The reason most teams do not pick it is not technical: Pinecone's marketing got there first.
Weaviate's bet is that vector search is part of a richer object graph. You define a schema, you get hybrid search (BM25 plus vector) out of the box, and you get modules for embedding generation. This is genuinely useful if your search is multimodal or your schema is complex.
It is also more expensive (~$135/month at 10M on Weaviate Cloud) and the learning curve is real. Pick it when the schema model maps to your domain. Skip it for plain RAG, where simpler tools win.
Milvus is the engine to beat above 100M vectors if you have ops capacity. It scales linearly to billions of vectors, supports multiple index types (IVF, HNSW, DiskANN), and the GPU-acceleration story is real. Zilliz Cloud removes most of the ops, but at scale you pay for it.
The honest take: Milvus is overkill for under 50M vectors. Above 100M, with a team that knows Kubernetes and wants the cost control, it is the right answer. Most teams reading this post will never need it.
Chroma's developer experience is unmatched. pip install chromadb, three lines, you have a working RAG demo. That is the right tool for the first week of a project.
Do not ship it. Persistence, replication, and operational tooling are not where pgvector or Qdrant are. We have seen too many teams hit a six-figure ARR with a Chroma instance running in a Docker container on a single VM and discover the migration cost the hard way.
Pure semantic search benchmarks are misleading. In production, almost every query is filtered: by tenant, by user, by date, by document type. HNSW, the algorithm under most vector DBs, was not designed for this. Naive implementations either pre-filter (and miss matches because the candidate set is too small) or post-filter (and waste 95% of the search). Both wreck recall.
The engines that handle this well in 2026:
The engines where filtering is rough: Pinecone (filtered queries can drop recall by 20-30% depending on selectivity), Chroma (filtering is post-hoc), and vanilla pgvector with HNSW (works, but tune ef_search carefully). If your workload is heavily filtered, this single axis should drive your pick more than raw QPS.
For a deeper look at how filtering interacts with chunking, retrieval, and reranking, the production RAG architecture post is the companion piece to this one.
"Supports hybrid search" is in every vendor's marketing. The implementations are not equivalent. Real hybrid search needs a usable BM25 (or SPLADE) implementation, a fusion strategy (Reciprocal Rank Fusion is the standard), and the ability to weight the two signals per query.
If you are doing legal, code, or invoice search, hybrid is not optional. If you are doing pure conceptual retrieval (FAQ chatbot, summarization), pure vector is fine.
Marketing pages quote cost at one favorable scale. Production teams need the curve.
| Engine | 1M vectors | 10M vectors | 100M vectors |
|---|---|---|---|
| pgvector on RDS | ~$30/mo | ~$45 to $80/mo | requires sharding, plan migration |
| Pinecone Serverless | ~$15/mo | ~$70/mo | ~$700/mo+ |
| Turbopuffer | ~$64/mo (min) | ~$80/mo | ~$120/mo |
| Qdrant Cloud | ~$30/mo | ~$65/mo | ~$400/mo |
| Weaviate Cloud | ~$45/mo | ~$135/mo | ~$800/mo+ |
| Milvus self-host | infra + ops time | infra + ops time | ~$200/mo infra + on-call |
These numbers reflect publicly listed pricing as of early 2026 and assume moderate read traffic. Your bill will vary with embedding dimension, query rate, and index type. The shape of the curve, not the exact number, is what matters: pgvector wins until 10M, Turbopuffer wins above 10M for cold workloads, and Pinecone is the predictable middle.
Pick by scale and operator capacity, not by benchmark tables.
The migration risk is the highest hidden cost. Get the call right the first time. If the in-house team does not have Postgres extension experience or distributed-systems instincts, this is a place to bring in someone who has done the migration before. Every engineer on Cadence is AI-native by default and several of our senior tier ($1,500/week) have shipped vector-search migrations from Pinecone to pgvector or to Turbopuffer. A 48-hour trial is enough to scope the work.
If you are picking a vector DB for the first time and want a second opinion, the decide tool takes a 90-second spec and gives you a Build / Buy / Book recommendation with rationale. It is free and it does not collect your data.
For most teams, pgvector on managed Postgres (Supabase, Neon, RDS) is the best production choice up to 50M vectors. It removes a dependency, keeps filtered queries cheap, and runs on infrastructure your team already operates. Above 50M, evaluate Turbopuffer (cold-tolerant) or Qdrant (real-time).
Yes. Companies including Supabase, Neon, and Instacart run pgvector at significant scale. With pgvectorscale, it hits 471 QPS at 99% recall on 50M vectors. The "pgvector is for prototypes" claim is two years out of date. Read the Datadog review for SaaS observability for how to instrument it once it is in production.
Pick pgvector if you are already on Postgres and under 50M vectors. Pick Pinecone if you want zero ops, predictable latency, and your finance team is comfortable with a usage-based bill that grows with scale. The break-even is operator time, not raw cost.
Turbopuffer at 100M vectors. Its object-storage-first architecture puts storage at $70/TB/month versus $1,600/TB/month for traditional vector stacks. The trade is cold-query latency of up to 444ms p90, so it is the right pick only when your workload tolerates cold starts.
If you are starting today and already use Postgres, install pgvector and ship. You can revisit when (a) the corpus crosses 50M vectors, (b) p99 latency requirements break the index, or (c) the bill becomes a real line item. Picking too soon is the more common mistake than picking too late.