May 7, 2026 · 10 min read · Cadence Editorial

Best vector databases for production

best vector database production — Best vector databases for production
Photo by [panumas nikhomkhai](https://www.pexels.com/@cookiecutter) on [Pexels](https://www.pexels.com/photo/boxes-of-computers-17489152/)

Best vector databases for production

The best vector database for production in 2026 is pgvector if you are on Postgres under 50M vectors, Turbopuffer if you can tolerate cold queries above 10M vectors, and Pinecone if you want zero ops. Everything else is a niche pick. The real decision is not "which engine wins benchmarks" but "what does this look like at 3am."

The honest verdict, by scale

Most vector database posts grade engines like a benchmark sheet. That misses the point. Production cost is dominated by three things the benchmark tables hide: who operates the cluster on call, how filtered queries behave at p99, and whether you actually need sub-50ms latency. Here is the short version.

ScaleWorkloadPickWhy
Under 1MAnythingpgvectorOne database, zero ops, ~$30/month
1M to 10MLatency-sensitive RAGpgvector + pgvectorscale471 QPS at 99% recall on 50M, still in Postgres
1M to 10MManaged onlyPinecone Serverless~$70/month, no ops, predictable
10M to 100MCold-tolerant searchTurbopuffer$70/TB storage, warm p90 of 10ms
10M to 100MHot real-time agentsQdrant Cloudp50 of 4ms, predictable filtering
100M+Self-hosted, ops capacityMilvusLinear scale, but you pay an operator
PrototypesLocal devChromaFastest to import, do not ship it

If you take nothing else from this post, take the table. The rest is the reasoning behind it.

What "production" actually means for vector DBs

The word "production" gets thrown around. For vector databases, it means something specific.

You have a corpus that grows weekly. You have filtered metadata queries (user_id, tenant_id, date_range) that need to return correct results, not just close ones. You have hybrid search because pure semantic search misses exact-match queries like "invoice 4471." You have an SLO. You have a 3am pager.

A tool that ships a clean demo with 100k vectors and unfiltered queries is a prototype. The transition to production breaks most of them in one of three places: filtered recall collapses, p99 latency triples under load, or the cost line item gets escalated by your CFO. The picks below are the ones we have seen survive all three.

The contenders, ranked by where they actually win

pgvector: the default until proven otherwise

If you are already running Postgres, you should start here. pgvector hits 8 to 25ms p95 under 10M vectors on a $50/month RDS instance. With pgvectorscale (Tiger Data's extension), it hits 471 QPS at 99% recall on 50M vectors. That covers 80% of real RAG workloads.

The real win is not speed. It is that your vectors live next to your users table. Filtered queries (WHERE tenant_id = $1 AND embedding <-> $2) are a single SQL statement, not a two-stage retrieval problem. Backups, replication, point-in-time recovery, audit logs: you already have them.

Where it breaks: above 50M vectors with high write throughput, HNSW index builds get expensive and VACUUM becomes a problem. If your corpus is approaching 100M and growing, plan the migration before you need it.

Turbopuffer: the storage-cost reset

Turbopuffer is the most interesting vector database of the last two years because it changes the cost ceiling. Object-storage-first means $70/TB/month versus the $1,600/TB/month incumbent stacks (RAM plus 3x SSD replication). At 100M vectors, that is the difference between a $700/month Pinecone bill and a sub-$100 Turbopuffer bill.

The trade-off is explicit. Cold queries on a 1M-vector namespace hit 444ms p90 (the document has not been touched recently and lives only on S3). Warm queries hit 10ms p90. You either accept that some queries take half a second, or you build a warming strategy.

Use it for: large doc corpora, infrequent-access search, multi-tenant SaaS where most tenants are quiet. Skip it for: real-time agent retrieval where every query must be fast.

Pinecone: the no-ops managed bet

Pinecone Serverless is what you pick when you do not want to think about it. Latency is a consistent ~8ms p50, scaling is automatic, and the developer experience is the cleanest in the category. It is not the cheapest and it is not the fastest, but it is the most predictable.

The cost honesty: at 10M vectors, expect ~$70/month. At 100M vectors with active reads, expect $700+/month. This is fine if your retrieval cost is a small line item. It becomes a problem the moment a CFO asks why the bill ten-x'd.

Pinecone also caps at 20 indexes on standard plans (with up to 100,000 namespaces). For multi-tenant SaaS, this matters: you almost always want one index, many namespaces, not the reverse.

Qdrant: the open-core speed pick

Qdrant delivers p50 of 4ms and p99 of 25ms on standard benchmarks, the lowest among purpose-built vector DBs. The filtering implementation is genuinely better than the others: payload indexes are first-class, so filtered queries do not blow up recall the way they do on naive HNSW.

Qdrant Cloud at 10M vectors runs ~$65/month. Self-hosted is straightforward if you are comfortable with a single Rust binary plus a persistent volume. Use Qdrant when you have an agentic workload with strict latency SLOs and meaningful metadata filtering. The reason most teams do not pick it is not technical: Pinecone's marketing got there first.

Weaviate: the schema-first option

Weaviate's bet is that vector search is part of a richer object graph. You define a schema, you get hybrid search (BM25 plus vector) out of the box, and you get modules for embedding generation. This is genuinely useful if your search is multimodal or your schema is complex.

It is also more expensive (~$135/month at 10M on Weaviate Cloud) and the learning curve is real. Pick it when the schema model maps to your domain. Skip it for plain RAG, where simpler tools win.

Milvus: the at-scale workhorse

Milvus is the engine to beat above 100M vectors if you have ops capacity. It scales linearly to billions of vectors, supports multiple index types (IVF, HNSW, DiskANN), and the GPU-acceleration story is real. Zilliz Cloud removes most of the ops, but at scale you pay for it.

The honest take: Milvus is overkill for under 50M vectors. Above 100M, with a team that knows Kubernetes and wants the cost control, it is the right answer. Most teams reading this post will never need it.

Chroma: ship it as a prototype, not as production

Chroma's developer experience is unmatched. pip install chromadb, three lines, you have a working RAG demo. That is the right tool for the first week of a project.

Do not ship it. Persistence, replication, and operational tooling are not where pgvector or Qdrant are. We have seen too many teams hit a six-figure ARR with a Chroma instance running in a Docker container on a single VM and discover the migration cost the hard way.

The trade-off most posts skip: filtered queries

Pure semantic search benchmarks are misleading. In production, almost every query is filtered: by tenant, by user, by date, by document type. HNSW, the algorithm under most vector DBs, was not designed for this. Naive implementations either pre-filter (and miss matches because the candidate set is too small) or post-filter (and waste 95% of the search). Both wreck recall.

The engines that handle this well in 2026:

  • Qdrant has payload indexes that prune the HNSW graph during traversal. Filtered recall stays close to unfiltered recall.
  • pgvector with pgvectorscale uses StreamingDiskANN which respects predicates better than vanilla pgvector.
  • Turbopuffer filters at the storage layer before scoring, which is naturally efficient when storage is object storage.

The engines where filtering is rough: Pinecone (filtered queries can drop recall by 20-30% depending on selectivity), Chroma (filtering is post-hoc), and vanilla pgvector with HNSW (works, but tune ef_search carefully). If your workload is heavily filtered, this single axis should drive your pick more than raw QPS.

For a deeper look at how filtering interacts with chunking, retrieval, and reranking, the production RAG architecture post is the companion piece to this one.

Hybrid search: more than checkbox feature

"Supports hybrid search" is in every vendor's marketing. The implementations are not equivalent. Real hybrid search needs a usable BM25 (or SPLADE) implementation, a fusion strategy (Reciprocal Rank Fusion is the standard), and the ability to weight the two signals per query.

  • Turbopuffer has BM25 as a first-class index. It is the cleanest hybrid story in the lineup.
  • Weaviate has BM25 built into the schema. Solid, if you are already in Weaviate.
  • pgvector does not have BM25, but Postgres has full-text search. You union the two and apply RRF in SQL. It works, but it is a lot of glue.
  • Pinecone added hybrid search via sparse-dense vectors. Functional, but you generate the sparse vectors yourself.
  • Qdrant supports sparse vectors natively. Bring your own SPLADE.

If you are doing legal, code, or invoice search, hybrid is not optional. If you are doing pure conceptual retrieval (FAQ chatbot, summarization), pure vector is fine.

Cost honesty at 1M, 10M, 100M

Marketing pages quote cost at one favorable scale. Production teams need the curve.

Engine1M vectors10M vectors100M vectors
pgvector on RDS~$30/mo~$45 to $80/morequires sharding, plan migration
Pinecone Serverless~$15/mo~$70/mo~$700/mo+
Turbopuffer~$64/mo (min)~$80/mo~$120/mo
Qdrant Cloud~$30/mo~$65/mo~$400/mo
Weaviate Cloud~$45/mo~$135/mo~$800/mo+
Milvus self-hostinfra + ops timeinfra + ops time~$200/mo infra + on-call

These numbers reflect publicly listed pricing as of early 2026 and assume moderate read traffic. Your bill will vary with embedding dimension, query rate, and index type. The shape of the curve, not the exact number, is what matters: pgvector wins until 10M, Turbopuffer wins above 10M for cold workloads, and Pinecone is the predictable middle.

What to do this week

Pick by scale and operator capacity, not by benchmark tables.

  1. Under 10M vectors, on Postgres already: ship pgvector. Stop reading.
  2. Above 10M with cold-tolerant queries: prototype Turbopuffer in a side project, measure cold-start in your real query distribution.
  3. Real-time agent workload, strict p99: prototype Qdrant Cloud, validate filtered recall on your real metadata.
  4. No ops capacity, willing to pay for predictability: Pinecone Serverless, accept the cost ceiling.
  5. Already at 100M vectors: you are past this post. Talk to a vector-search specialist.

The migration risk is the highest hidden cost. Get the call right the first time. If the in-house team does not have Postgres extension experience or distributed-systems instincts, this is a place to bring in someone who has done the migration before. Every engineer on Cadence is AI-native by default and several of our senior tier ($1,500/week) have shipped vector-search migrations from Pinecone to pgvector or to Turbopuffer. A 48-hour trial is enough to scope the work.

If you are picking a vector DB for the first time and want a second opinion, the decide tool takes a 90-second spec and gives you a Build / Buy / Book recommendation with rationale. It is free and it does not collect your data.

FAQ

What is the best vector database for production RAG?

For most teams, pgvector on managed Postgres (Supabase, Neon, RDS) is the best production choice up to 50M vectors. It removes a dependency, keeps filtered queries cheap, and runs on infrastructure your team already operates. Above 50M, evaluate Turbopuffer (cold-tolerant) or Qdrant (real-time).

Is pgvector good enough for production?

Yes. Companies including Supabase, Neon, and Instacart run pgvector at significant scale. With pgvectorscale, it hits 471 QPS at 99% recall on 50M vectors. The "pgvector is for prototypes" claim is two years out of date. Read the Datadog review for SaaS observability for how to instrument it once it is in production.

Pinecone vs pgvector: which should I pick?

Pick pgvector if you are already on Postgres and under 50M vectors. Pick Pinecone if you want zero ops, predictable latency, and your finance team is comfortable with a usage-based bill that grows with scale. The break-even is operator time, not raw cost.

What is the cheapest vector database at scale?

Turbopuffer at 100M vectors. Its object-storage-first architecture puts storage at $70/TB/month versus $1,600/TB/month for traditional vector stacks. The trade is cold-query latency of up to 444ms p90, so it is the right pick only when your workload tolerates cold starts.

Do I need a vector database, or is one good enough?

If you are starting today and already use Postgres, install pgvector and ship. You can revisit when (a) the corpus crosses 50M vectors, (b) p99 latency requirements break the index, or (c) the bill becomes a real line item. Picking too soon is the more common mistake than picking too late.

All posts