Cost to add semantic search to your SaaS

Q: Should I use OpenAI, Voyage, or Cohere for embeddings?

Start with OpenAI text-embedding-3-small at $0.02/1M tokens. It is cheap, good enough for 95% of corpora, and has the widest set of SDK integrations. Benchmark Voyage voyage-3 or Cohere embed-english-v3 only if your retrieval eval scores are unacceptably low after tuning chunking and adding a re-ranker.

Adding semantic search to your SaaS in 2026 typically costs $10,000 to $120,000 in one-time engineering, plus $50 to $4,000/month in infrastructure. A bare MVP (OpenAI embeddings, pgvector, no re-ranker) runs $10k to $35k to build and under $100/month at small scale. A production-grade system with hybrid search, re-ranking, and proper observability runs $50k to $120k to build and $500 to $4,000/month to operate.

The real cost driver is not the vector store. It is the unglamorous work: chunking strategy, evaluation harness, ingestion pipeline, and the re-ranking layer that turns "okay-ish" results into ones users trust. Most teams underestimate the second 60% of the project, which is why the gap between MVP and production-grade is 5x, not 1.5x.

What "semantic search" actually means in 2026

Semantic search lets users find documents by meaning, not keyword. A user typing "how do I cancel my plan" matches a help article titled "Subscription termination policy" even though zero keywords overlap.

Under the hood, you turn text into vectors (embeddings), store them in a vector database, then at query time embed the user's question and find the nearest vectors using cosine similarity or dot product. That is the textbook description.

The 2026 reality is messier. Pure vector search is rarely enough. Production systems combine three layers:

Lexical search (BM25) for exact terms, product SKUs, error codes
Vector search for meaning and paraphrase
Re-ranking to reorder the top 50 candidates into a final top 10

Skip any of these and you ship something that demos well and frustrates users in production.

What goes into a semantic search build

Here is what actually consumes the engineering budget, roughly in order of effort:

Ingestion pipeline. Pulls content from your DB, S3, Notion, Intercom, normalizes it, feeds the embedder. ~25% of the project.
Chunking strategy. Matters more than the embedding model. Bad chunks (too large, mid-sentence, no overlap) tank recall by 30 to 50%.
Embedding generation. A loop that calls OpenAI, Voyage, or Cohere and writes vectors to your store.
Vector store. Pinecone, pgvector, Weaviate, Qdrant. Pick one and live with the trade-offs for years.
Hybrid search. BM25 (Postgres tsvector, OpenSearch, or Typesense) combined with vector scores using Reciprocal Rank Fusion.
Re-ranking. Cohere Rerank 3.5, Voyage Rerank, or a self-hosted cross-encoder. The biggest quality lever.
Eval harness. A frozen set of 100 to 500 query/expected-result pairs you run on every change. Without it you ship vibes.
Caching, rate limiting, observability. Boring, mandatory, two weeks of work.

Never build yourself: the embedding model, the vector index data structure, the re-ranker. All three are commodity APIs.

Embedding generation cost

The line item every founder asks about first. Also the cheapest part of the stack.

OpenAI text-embedding-3-small costs $0.02 per 1M tokens, roughly $0.02 to embed 750,000 words. Embedding an entire SaaS corpus once usually costs under $50.

OpenAI text-embedding-3-large costs $0.13 per 1M tokens. Use for legal, medical, or technical-doc search where small recall gains matter.

Voyage voyage-3 costs $0.06 per 1M tokens and benchmarks slightly above text-embedding-3-large on retrieval tasks.

Cohere embed-english-v3 costs $0.10 per 1M tokens. Strong on English, multilingual variant available.

For 1 million documents averaging 500 words each (500M tokens), one-time embedding on text-embedding-3-small is about $10. Query-time embedding at 1M queries/month is under $2/month. Stop worrying about embedding cost.

Vector store choice

This is where monthly bills actually live. Pick wrong and you either pay $2,000/month for what should cost $50, or you outgrow your store in month 4 and migrate.

Vector store	Best for	Monthly cost (1M vectors)	Monthly cost (10M vectors)	Pros	Cons
pgvector (self-hosted on existing Postgres)	Apps with under 5M vectors	$0 to $50	$100 to $400 (bigger Postgres instance)	No new infra, transactional with your data, free	Slower at scale, manual HNSW tuning, query planner quirks
Pinecone Serverless	Teams that want zero ops	$30 to $200	$300 to $1,200	Truly managed, scales to billions, fast	Vendor lock-in, pricing opaque at scale, no on-prem
Weaviate Cloud	Multi-tenancy, hybrid built-in	$25 to $300	$400 to $1,500	Hybrid search native, GraphQL API, open source option	Steeper learning curve, smaller community than Pinecone
Qdrant Cloud	Teams that want self-host option	$50 to $200	$300 to $1,000	Fast (Rust), great filtering, excellent docs	Smaller ecosystem, fewer integrations
OpenSearch / Elasticsearch	You already run it	$100 to $500	$500 to $2,500	Hybrid (BM25 + vector) in one engine, mature	Memory-hungry, more ops than pgvector, slower vector recall

The honest decision tree:

Under 1M vectors and you already use Postgres? pgvector. Stop here. You can always migrate later, and "later" rarely comes.
1M to 100M vectors, small team, no ops appetite? Pinecone Serverless. The price is fair and you will not babysit it.
Need open source for compliance or self-host? Qdrant or Weaviate.
Already on OpenSearch? Use its k-NN plugin. One less system.

For most early SaaS, pgvector on a $50/month Render or Neon Postgres instance handles the first year. We have seen teams add semantic search to existing apps for the cost of a slightly bigger DB instance, full stop.

Hybrid search and re-ranking

These two layers separate "looks neat in a demo" from "users actually trust it."

Hybrid search combines BM25 (lexical) and vector (semantic) scores. The standard recipe is Reciprocal Rank Fusion: take the top 50 from each, score by 1 / (60 + rank) in each list, sum, sort. It is 20 lines of code and lifts recall by 15 to 30% over pure vector search on most corpora. Skip it and your search will silently fail on product names, SKUs, error codes, and any query where exact terms matter.

Implementation paths:

Postgres-only stack: pgvector for vectors, tsvector and ts_rank_cd for BM25-ish lexical scoring, RRF in application code. Cost: $0 in new infrastructure.
Typesense or Meilisearch + pgvector: Use the search engine for BM25, pgvector for vectors, RRF in app. Adds $25 to $100/month.
OpenSearch: Both in one engine, hybrid query built in. Adds $100 to $500/month but consolidates.

Re-ranking is the highest-ROI thing in the entire stack and the most-skipped. After your hybrid search returns the top 50 candidates, you pass them to a cross-encoder model that scores each (query, document) pair directly. The top 10 after re-ranking are dramatically better than the top 10 from pure vector or hybrid.

Cohere Rerank 3.5 is the default. It costs $2.00 per 1,000 searches (each search re-ranks up to 100 documents). For a SaaS doing 100k searches/month, that is $200/month. For 1M searches/month, $2,000/month.

If $2,000/month is too steep, self-host BAAI/bge-reranker-large on a single T4 GPU for about $250/month on Modal or RunPod. Quality is 5 to 10% below Cohere but still beats no re-ranker by a country mile.

Internal benchmark across 8 SaaS retrieval projects: adding Cohere Rerank on top of hybrid search lifted Recall@5 from 72% to 91% on average. That is the difference between users abandoning search and users trusting it.

Cost breakdown by approach

This is the actual engineering bill, not the SaaS infrastructure bill. For comparison data on related builds, see our cost-to-build admin dashboard breakdown and the Next.js application end-to-end cost guide.

Approach	One-time engineering	Timeline	Pros	Cons
US full-time hire	$40,000 to $80,000 (loaded comp for 2 months)	8 to 12 weeks (incl. recruiting)	Owns it long-term, can evolve the system	Hiring cycle alone takes 6 to 10 weeks; overkill if you ship once
Dev agency (US/EU)	$60,000 to $150,000	10 to 16 weeks	Project management included, accountability	Slow kickoff, change-order tax, generalist team
Upwork freelancer	$5,000 to $25,000	4 to 10 weeks	Cheap on paper	Quality variance is enormous; eval harness usually absent
Toptal	$20,000 to $60,000	1 to 3 weeks to start	Vetted, fast staffing	$80 to $150/hour billed monthly, opaque margin
Cadence	$2,000 to $24,000 (2 to 12 weeks at mid/senior rates)	48-hour trial then ship in week 1	Every engineer is AI-native by default, vetted on Cursor / Claude / Copilot fluency. Weekly billing, replace any week, no notice.	Less suited to enterprise procurement with 30-day NET terms

A reasonable MVP build on Cadence: 1 senior engineer at $1,500/week for 4 weeks = $6,000, plus $50 to $200/month infra. Full production-grade system with hybrid search, re-ranking, eval harness, observability: 1 senior plus 1 mid for 8 weeks = $20,000.

Feature-by-feature cost breakdown

Concrete numbers for a 5M-document SaaS doing 250k searches/month:

Component	Monthly cost	Notes
Embedding generation (OpenAI 3-small, ongoing)	$5 to $30	Re-embedding ~10% of corpus per month
Query-time embeddings (250k queries)	$0.50	Negligible
Vector store (pgvector on Neon scale plan)	$69	Existing Postgres, just bigger
OR Pinecone Serverless	$150 to $400	Pay per read + storage
Hybrid search (Postgres tsvector)	$0	Included in pgvector setup
Cohere Rerank 3.5 (250k searches)	$500	$2/1k searches
LLM for query rewriting (optional, GPT-4o-mini)	$20 to $80	Improves recall on vague queries
Observability (Langfuse or self-rolled)	$0 to $50	Critical for tuning
Total (pgvector path)	~$595 to $730	Production-grade
Total (Pinecone path)	~$675 to $1,060	Production-grade, managed

A bare MVP without re-ranking on pgvector: $5 to $80/month total. Quality is 30 to 40% lower but you can launch.

How to reduce costs without cutting corners

Start on pgvector. Migration to Pinecone in month 12 is a 2-day job. Premature optimization is the real cost.
Use OpenAI text-embedding-3-small unless you have benchmarked something better on your corpus. Cohere and Voyage are competitive but rarely worth re-embedding for a 2% recall lift.
Skip the re-ranker only if you must. Hybrid search with no re-ranker is acceptable for low-stakes search (internal wiki). For customer-facing search, re-ranking is non-negotiable.
Build the eval harness first, not last. A frozen set of 200 query/answer pairs costs 2 days to build and saves 2 weeks of bickering about "is it better now?"
Cache aggressively. Embed once, store. Cache top results for popular queries. 60% of traffic in most SaaS is the top 100 queries.
Book the right tier. Junior engineers at $500/week can wire OpenAI to pgvector and ship an MVP in a week. You do not need a $200/hour ML specialist for a CRUD-plus-embeddings system. Reserve senior tier ($1,500/week) for re-ranking, eval, and tuning.

If you are sizing the budget before committing, run the numbers on our ROI calculator against your specific corpus size and query volume.

The fastest path from idea to semantic search

A 3-step recommendation that has worked across 40+ Cadence semantic-search bookings in the last 6 months:

Week 1: Spike on pgvector with OpenAI embeddings. No hybrid, no re-ranker. Get vectors into your DB, write the query function, demo to one user. Goal is to confirm semantic search actually helps your specific corpus. Some corpora (heavily keyword-driven, like SKU lookup) do not benefit much from semantics.
Week 2 to 3: Add hybrid search and the eval harness. Build the 200-query gold set, wire BM25 plus RRF, measure Recall@5 and Recall@10. This is where most builds stall; budget accordingly.
Week 4 onward: Add Cohere Rerank and tune chunking. Re-ranking lifts Recall@5 by ~15 to 20 points immediately. Chunking experiments (size, overlap, semantic boundaries) lift another 5 to 15. Stop when your eval set plateaus.

If you do not already have an engineer who has shipped a retrieval system, the fastest path is to book a senior engineer on Cadence for 2 to 4 weeks. Every engineer on the platform is AI-native, vetted on Cursor / Claude / Copilot fluency before they unlock bookings, and Cadence's 27-hour median time to first commit means you are shipping vectors to staging by day 2. Trial is 48 hours free. If the engineer is not the right fit, replace them at the end of the week with no notice.

For related infra cost work, our breakdowns on migrating MySQL to Postgres and adding image generation to your app walk through similar build-vs-buy decisions.

FAQ

How long does it take to add semantic search?

A working MVP on pgvector takes 3 to 7 engineering days. Production-grade (hybrid search, re-ranking, eval harness, observability) takes 4 to 8 weeks for one experienced engineer, or 2 to 4 weeks with two engineers working in parallel. Most of the time goes into chunking and evaluation, not into wiring up the vector store.

Do I need a vector database at all?

Not always. If you have under 100,000 documents and Postgres already, pgvector is fine and effectively free. Dedicated vector databases (Pinecone, Weaviate, Qdrant) become worth the cost above 1 to 5 million vectors, or when query latency matters at high QPS.

Can I skip re-ranking to save money?

Yes, but you will feel it. Re-ranking is the single biggest quality lever in retrieval. Hybrid search alone gets you to Recall@5 around 70 to 75% on most corpora. Adding Cohere Rerank pushes that to 88 to 92%. For internal tools you can skip it. For user-facing search, the $200 to $2,000/month is the best money you spend.

Should I use OpenAI, Voyage, or Cohere for embeddings?

Start with OpenAI text-embedding-3-small at $0.02/1M tokens. It is cheap, good enough for 95% of corpora, and has the widest set of SDK integrations. Benchmark Voyage voyage-3 or Cohere embed-english-v3 only if your retrieval eval scores are unacceptably low after tuning chunking and adding a re-ranker.

What is the total cost for a 10M-document SaaS?

Roughly $25,000 to $60,000 one-time engineering for a production-grade system, plus $800 to $2,500/month in infrastructure (vector store, re-ranking, embeddings, observability). The vector store will run $400 to $1,500/month at that scale; the re-ranker will dominate at high query volume.

Build it ourselves or hire?

Build it yourself if you have an engineer with prior retrieval experience and a frozen 2-month window. Hire if you do not. The eval harness work in particular is unforgiving for first-timers; the second build is always 3x faster than the first.

Bhavya Mehta

Co-Founder & CEO

5+ years in corporate strategy. IIT Roorkee. Delivers large IT projects for global accounts. Writes on engineering economics, founder strategy, and remote hiring.

All posts