
Adding semantic search to your SaaS in 2026 typically costs $10,000 to $120,000 in one-time engineering, plus $50 to $4,000/month in infrastructure. A bare MVP (OpenAI embeddings, pgvector, no re-ranker) runs $10k to $35k to build and under $100/month at small scale. A production-grade system with hybrid search, re-ranking, and proper observability runs $50k to $120k to build and $500 to $4,000/month to operate.
The real cost driver is not the vector store. It is the unglamorous work: chunking strategy, evaluation harness, ingestion pipeline, and the re-ranking layer that turns "okay-ish" results into ones users trust. Most teams underestimate the second 60% of the project, which is why the gap between MVP and production-grade is 5x, not 1.5x.
Semantic search lets users find documents by meaning, not keyword. A user typing "how do I cancel my plan" matches a help article titled "Subscription termination policy" even though zero keywords overlap.
Under the hood, you turn text into vectors (embeddings), store them in a vector database, then at query time embed the user's question and find the nearest vectors using cosine similarity or dot product. That is the textbook description.
The 2026 reality is messier. Pure vector search is rarely enough. Production systems combine three layers:
Skip any of these and you ship something that demos well and frustrates users in production.
Here is what actually consumes the engineering budget, roughly in order of effort:
tsvector, OpenSearch, or Typesense) combined with vector scores using Reciprocal Rank Fusion.Never build yourself: the embedding model, the vector index data structure, the re-ranker. All three are commodity APIs.
The line item every founder asks about first. Also the cheapest part of the stack.
OpenAI text-embedding-3-small costs $0.02 per 1M tokens, roughly $0.02 to embed 750,000 words. Embedding an entire SaaS corpus once usually costs under $50.
OpenAI text-embedding-3-large costs $0.13 per 1M tokens. Use for legal, medical, or technical-doc search where small recall gains matter.
Voyage voyage-3 costs $0.06 per 1M tokens and benchmarks slightly above text-embedding-3-large on retrieval tasks.
Cohere embed-english-v3 costs $0.10 per 1M tokens. Strong on English, multilingual variant available.
For 1 million documents averaging 500 words each (500M tokens), one-time embedding on text-embedding-3-small is about $10. Query-time embedding at 1M queries/month is under $2/month. Stop worrying about embedding cost.
This is where monthly bills actually live. Pick wrong and you either pay $2,000/month for what should cost $50, or you outgrow your store in month 4 and migrate.
| Vector store | Best for | Monthly cost (1M vectors) | Monthly cost (10M vectors) | Pros | Cons |
|---|---|---|---|---|---|
| pgvector (self-hosted on existing Postgres) | Apps with under 5M vectors | $0 to $50 | $100 to $400 (bigger Postgres instance) | No new infra, transactional with your data, free | Slower at scale, manual HNSW tuning, query planner quirks |
| Pinecone Serverless | Teams that want zero ops | $30 to $200 | $300 to $1,200 | Truly managed, scales to billions, fast | Vendor lock-in, pricing opaque at scale, no on-prem |
| Weaviate Cloud | Multi-tenancy, hybrid built-in | $25 to $300 | $400 to $1,500 | Hybrid search native, GraphQL API, open source option | Steeper learning curve, smaller community than Pinecone |
| Qdrant Cloud | Teams that want self-host option | $50 to $200 | $300 to $1,000 | Fast (Rust), great filtering, excellent docs | Smaller ecosystem, fewer integrations |
| OpenSearch / Elasticsearch | You already run it | $100 to $500 | $500 to $2,500 | Hybrid (BM25 + vector) in one engine, mature | Memory-hungry, more ops than pgvector, slower vector recall |
The honest decision tree:
For most early SaaS, pgvector on a $50/month Render or Neon Postgres instance handles the first year. We have seen teams add semantic search to existing apps for the cost of a slightly bigger DB instance, full stop.
These two layers separate "looks neat in a demo" from "users actually trust it."
Hybrid search combines BM25 (lexical) and vector (semantic) scores. The standard recipe is Reciprocal Rank Fusion: take the top 50 from each, score by 1 / (60 + rank) in each list, sum, sort. It is 20 lines of code and lifts recall by 15 to 30% over pure vector search on most corpora. Skip it and your search will silently fail on product names, SKUs, error codes, and any query where exact terms matter.
Implementation paths:
tsvector and ts_rank_cd for BM25-ish lexical scoring, RRF in application code. Cost: $0 in new infrastructure.Re-ranking is the highest-ROI thing in the entire stack and the most-skipped. After your hybrid search returns the top 50 candidates, you pass them to a cross-encoder model that scores each (query, document) pair directly. The top 10 after re-ranking are dramatically better than the top 10 from pure vector or hybrid.
Cohere Rerank 3.5 is the default. It costs $2.00 per 1,000 searches (each search re-ranks up to 100 documents). For a SaaS doing 100k searches/month, that is $200/month. For 1M searches/month, $2,000/month.
If $2,000/month is too steep, self-host BAAI/bge-reranker-large on a single T4 GPU for about $250/month on Modal or RunPod. Quality is 5 to 10% below Cohere but still beats no re-ranker by a country mile.
Internal benchmark across 8 SaaS retrieval projects: adding Cohere Rerank on top of hybrid search lifted Recall@5 from 72% to 91% on average. That is the difference between users abandoning search and users trusting it.
This is the actual engineering bill, not the SaaS infrastructure bill. For comparison data on related builds, see our cost-to-build admin dashboard breakdown and the Next.js application end-to-end cost guide.
| Approach | One-time engineering | Timeline | Pros | Cons |
|---|---|---|---|---|
| US full-time hire | $40,000 to $80,000 (loaded comp for 2 months) | 8 to 12 weeks (incl. recruiting) | Owns it long-term, can evolve the system | Hiring cycle alone takes 6 to 10 weeks; overkill if you ship once |
| Dev agency (US/EU) | $60,000 to $150,000 | 10 to 16 weeks | Project management included, accountability | Slow kickoff, change-order tax, generalist team |
| Upwork freelancer | $5,000 to $25,000 | 4 to 10 weeks | Cheap on paper | Quality variance is enormous; eval harness usually absent |
| Toptal | $20,000 to $60,000 | 1 to 3 weeks to start | Vetted, fast staffing | $80 to $150/hour billed monthly, opaque margin |
| Cadence | $2,000 to $24,000 (2 to 12 weeks at mid/senior rates) | 48-hour trial then ship in week 1 | Every engineer is AI-native by default, vetted on Cursor / Claude / Copilot fluency. Weekly billing, replace any week, no notice. | Less suited to enterprise procurement with 30-day NET terms |
A reasonable MVP build on Cadence: 1 senior engineer at $1,500/week for 4 weeks = $6,000, plus $50 to $200/month infra. Full production-grade system with hybrid search, re-ranking, eval harness, observability: 1 senior plus 1 mid for 8 weeks = $20,000.
Concrete numbers for a 5M-document SaaS doing 250k searches/month:
| Component | Monthly cost | Notes |
|---|---|---|
| Embedding generation (OpenAI 3-small, ongoing) | $5 to $30 | Re-embedding ~10% of corpus per month |
| Query-time embeddings (250k queries) | $0.50 | Negligible |
| Vector store (pgvector on Neon scale plan) | $69 | Existing Postgres, just bigger |
| OR Pinecone Serverless | $150 to $400 | Pay per read + storage |
| Hybrid search (Postgres tsvector) | $0 | Included in pgvector setup |
| Cohere Rerank 3.5 (250k searches) | $500 | $2/1k searches |
| LLM for query rewriting (optional, GPT-4o-mini) | $20 to $80 | Improves recall on vague queries |
| Observability (Langfuse or self-rolled) | $0 to $50 | Critical for tuning |
| Total (pgvector path) | ~$595 to $730 | Production-grade |
| Total (Pinecone path) | ~$675 to $1,060 | Production-grade, managed |
A bare MVP without re-ranking on pgvector: $5 to $80/month total. Quality is 30 to 40% lower but you can launch.
text-embedding-3-small unless you have benchmarked something better on your corpus. Cohere and Voyage are competitive but rarely worth re-embedding for a 2% recall lift.If you are sizing the budget before committing, run the numbers on our ROI calculator against your specific corpus size and query volume.
A 3-step recommendation that has worked across 40+ Cadence semantic-search bookings in the last 6 months:
If you do not already have an engineer who has shipped a retrieval system, the fastest path is to book a senior engineer on Cadence for 2 to 4 weeks. Every engineer on the platform is AI-native, vetted on Cursor / Claude / Copilot fluency before they unlock bookings, and Cadence's 27-hour median time to first commit means you are shipping vectors to staging by day 2. Trial is 48 hours free. If the engineer is not the right fit, replace them at the end of the week with no notice.
For related infra cost work, our breakdowns on migrating MySQL to Postgres and adding image generation to your app walk through similar build-vs-buy decisions.
A working MVP on pgvector takes 3 to 7 engineering days. Production-grade (hybrid search, re-ranking, eval harness, observability) takes 4 to 8 weeks for one experienced engineer, or 2 to 4 weeks with two engineers working in parallel. Most of the time goes into chunking and evaluation, not into wiring up the vector store.
Not always. If you have under 100,000 documents and Postgres already, pgvector is fine and effectively free. Dedicated vector databases (Pinecone, Weaviate, Qdrant) become worth the cost above 1 to 5 million vectors, or when query latency matters at high QPS.
Yes, but you will feel it. Re-ranking is the single biggest quality lever in retrieval. Hybrid search alone gets you to Recall@5 around 70 to 75% on most corpora. Adding Cohere Rerank pushes that to 88 to 92%. For internal tools you can skip it. For user-facing search, the $200 to $2,000/month is the best money you spend.
Start with OpenAI text-embedding-3-small at $0.02/1M tokens. It is cheap, good enough for 95% of corpora, and has the widest set of SDK integrations. Benchmark Voyage voyage-3 or Cohere embed-english-v3 only if your retrieval eval scores are unacceptably low after tuning chunking and adding a re-ranker.
Roughly $25,000 to $60,000 one-time engineering for a production-grade system, plus $800 to $2,500/month in infrastructure (vector store, re-ranking, embeddings, observability). The vector store will run $400 to $1,500/month at that scale; the re-ranker will dominate at high query volume.
Build it yourself if you have an engineer with prior retrieval experience and a frozen 2-month window. Hire if you do not. The eval harness work in particular is unforgiving for first-timers; the second build is always 3x faster than the first.