
The best AI model gateway services in 2026 are OpenRouter for fastest time-to-multi-model with pay-per-token billing, Portkey for teams that need observability and caching with a generous free tier, LiteLLM for self-hosted control with zero vendor lock-in, Helicone for the lightest-weight proxy + analytics combo, and Vercel AI Gateway if you already ship on Vercel and want fallback baked into your deploy pipeline. Pick by how much routing logic you want to own.
A model gateway is the single API in front of many providers (OpenAI, Anthropic, Google, Mistral, Groq, Together, Fireworks, your own fine-tunes) that handles routing, caching, fallback, rate-limit handling, and a cost cap. Without one, your app's openai.chat.completions.create() is hardwired to a single vendor, paying retail rates, with no fallback when a model goes down at 2am.
We've shipped production apps on all five gateways. Here's the honest take on when each one wins, when it loses, and what we'd pick today.
A gateway sits between your code and the model providers. Your app calls gateway.chat(), the gateway picks a model, retries on failure, falls back to a backup, caches the response, and reports the cost. Five jobs.
The five jobs:
If you're calling one provider with one model and shipping low volume, you don't need a gateway. If you're calling two or more providers, running production traffic, or worried about a single-vendor outage, you do.
| Gateway | Pricing | Self-host | Cache | Fallback | Best for |
|---|---|---|---|---|---|
| OpenRouter | Pay-per-token + 5% markup | No (SaaS only) | Basic | Yes (auto) | Fastest setup, 300+ models on one bill |
| Portkey | Free tier 10k req/mo, $99/mo Pro | Yes (open-source core) | Simple + semantic | Yes (configurable) | Teams that need observability + governance |
| LiteLLM | OSS free, Cloud from $99/mo | Yes (Docker, K8s) | Redis-backed | Yes (model lists) | Self-hosting with zero vendor lock-in |
| Helicone | Free 10k req/mo, $20/mo Pro | Yes (Cloud or self-host) | Yes | Yes (via proxy rules) | Lightest-weight proxy + dashboards |
| Vercel AI Gateway | Free with Vercel hosting, model retail rates | No | Yes | Yes (provider failover) | Teams already on Vercel + Next.js |
We'll dig into each below.
OpenRouter is a hosted SaaS with one API key, 300+ models, and a single monthly invoice. You hit https://openrouter.ai/api/v1/chat/completions with an OpenAI-compatible payload, set "model": "anthropic/claude-sonnet-4.5" or "model": "openai/gpt-5", and you're done. No provider accounts. No per-vendor billing.
Pricing. Pay-per-token at the provider's published rate, plus a 5% markup. There's no monthly fee. Free models (some open-source) cost nothing. You prepay credits to your OpenRouter wallet.
What it does well. Speed to value. We've moved a production app from raw OpenAI to OpenRouter in under an hour. Auto-fallback works out of the box: declare a list like ["anthropic/claude-sonnet-4.5", "openai/gpt-5"] and OpenRouter retries the second if the first fails. Their model marketplace is genuinely impressive; you can A/B test Llama 4 against Mistral Large without signing up for either vendor.
Where it loses. No self-hosting; your traffic always routes through openrouter.ai. The 5% markup adds up at scale (a $20k/mo AI bill becomes $21k). Caching is basic, no semantic match. Observability is functional but lighter than Portkey or Helicone.
Pick OpenRouter if. You want to ship multi-model in an afternoon, you're at <$10k/mo AI spend, and you don't need deep analytics or a self-hosted option. For solo founders and early-stage teams, it's the obvious pick.
Portkey is an open-core gateway with a hosted SaaS and a self-hostable open-source version. It's built for teams that take governance seriously: prompt versioning, per-user budgets, semantic caching, virtual keys, guardrails on output, full request logging.
Pricing. Free up to 10k requests/month. Pro at $99/month covers 100k requests with full features (caching, guardrails, fallbacks, prompt management). Enterprise is custom. The OSS gateway is free forever to self-host.
What it does well. The control panel is the best in the category. Per-key spend limits actually work; you can give your support agent a key with a $50/month cap and never worry. Semantic cache matches "What is your refund policy?" with "Tell me about refunds" and serves the cached answer. Prompt management lets non-engineers iterate on prompts without a deploy.
Where it loses. The hosted free tier (10k req/mo) is generous but small; production apps blow through it fast. Pro at $99/mo is fair, but the jump from free is steep if you're just over the limit. Setup takes longer than OpenRouter (15-30 minutes vs 5).
Pick Portkey if. You're a team of 3+ where multiple people touch prompts, you need spend governance, or you want semantic caching that actually understands paraphrases. We use Portkey on apps where the CFO asks "where did the AI budget go this quarter."
LiteLLM is the most popular open-source gateway. It's a Python proxy you run yourself (Docker, Kubernetes, bare metal) that exposes an OpenAI-compatible API and routes to 100+ providers underneath. The OSS version is free; the Cloud version (managed) starts at $99/month.
Pricing. OSS is free. LiteLLM Cloud is $99/month base + usage. The self-hosted proxy has no per-request cost; you pay only for the infrastructure you run it on (a single t3.small handles ~50 req/sec).
What it does well. Zero vendor lock-in. The proxy lives in your VPC, your traffic never leaves, your audit log is your audit log. Model lists with weighted routing and fallback are first-class. Redis-backed caching is well-implemented. The community ships features fast; new model support usually lands within a week of release.
Where it loses. You operate it. That means upgrades, scaling, monitoring, on-call. The config YAML can get unwieldy at 50+ models. The Cloud version is solid but younger than the OSS project; some Pro features lag the self-hosted version.
Pick LiteLLM if. You have an infra team that already runs services, you have compliance reasons to keep AI traffic in your VPC (HIPAA, FedRAMP, EU data residency), or you want a free gateway forever. For comparison, see our LaunchDarkly vs Statsig vs Vercel feature flags breakdown which has the same OSS-vs-managed trade-off.
Helicone started as an LLM observability tool and grew into a gateway. The proxy mode is dead simple: change your OpenAI base URL to https://oai.helicone.ai/v1, add a Helicone API key header, and every request is logged with cost, latency, tokens, and prompt content.
Pricing. Free for 10k requests/month. Pro at $20/month for 100k requests. Team and Enterprise scale up. Self-hosting is supported via the open-source repo.
What it does well. The dashboards are excellent. Cost per user, cost per session, prompt diffs, and a request playground that lets you replay any logged call against any model. The $20/mo Pro tier is the cheapest paid option in the category. Caching, retries, and fallbacks work via header config (no SDK swap needed).
Where it loses. The gateway features (routing, fallback) are less mature than OpenRouter or Portkey. You're picking Helicone primarily for the observability layer; the gateway is a bonus. Multi-provider routing requires more setup than OpenRouter's single-call magic.
Pick Helicone if. You want to see exactly what your AI is doing at a fraction of the cost of Datadog or LangSmith, and you're fine with lighter routing logic. We've shipped startup apps on Helicone Free for months without paying a dollar.
Vercel AI Gateway launched in early 2025 and ships free to any project on Vercel's hosting platform. It hooks into the Vercel AI SDK (ai package) and routes to OpenAI, Anthropic, Google, xAI, Groq, and others through a single config.
Pricing. Free for Vercel customers (Hobby, Pro, Enterprise). You pay model retail rates with no Vercel markup; Vercel makes its money on hosting. Non-Vercel customers can use it too but with usage limits on the free tier.
What it does well. Zero-friction setup if you're already on Vercel. Auto-failover between providers is a one-line config. Caching at the edge is fast. Streams work flawlessly with React Server Components and the AI SDK. Cost reporting shows up in your Vercel dashboard alongside bandwidth.
Where it loses. You're tied to the Vercel AI SDK and Vercel's deployment model. If you ship on AWS, GCP, or Render, it's a poor fit. Observability is thinner than Portkey or Helicone. Provider support, while growing, lags OpenRouter's 300+ models.
Pick Vercel AI Gateway if. You're a Next.js shop on Vercel, you already use the AI SDK, and you want gateway features at zero added cost. For everyone else, it's not the right pick.
A separate category worth naming: per-provider routing aggregators like Replicate, Together, and Fireworks. These give you one API for open-source models only (Llama, Mistral, FLUX, Stable Diffusion) at competitive rates, but they're not true multi-vendor gateways. If you only need open-source inference, see our OpenRouter vs Together vs Replicate comparison for the right pick. A true gateway like the five above sits in front of both the closed providers (OpenAI, Anthropic, Google) and the open aggregators.
Be honest: not every app needs one.
The trigger to adopt a gateway is one of: a production outage you couldn't route around, a cost spike you couldn't explain, or a second model added to the stack.
Audit your current AI stack. Count providers, count models, count how many places in your code call them directly. Three checks:
If you need an engineer to wire a gateway into an existing codebase, every engineer on Cadence is AI-native by default and has shipped against OpenAI, Anthropic, and at least one open-source provider. A Mid engineer at $1,000/week can swap your direct provider calls for a gateway, add fallback, and ship cost dashboards inside a single week. Our 12,800-engineer pool means you can usually start within 48 hours on a free trial.
For broader stack decisions, our take on the best monitoring tools for startups in 2026 covers the observability layer that pairs naturally with a model gateway. And if your app handles user content, our best file upload services for SaaS roundup covers the upload pipeline that often feeds vision models.
Try Cadence for the integration work. If you've picked a gateway and just need someone to wire it in, book a Mid engineer for $1,000/week. 48-hour free trial, weekly billing, cancel any time. Most teams have fallback and caching live by Friday.
OpenRouter. It's pay-per-token with no monthly fee, takes under an hour to integrate, and gives you 300+ models on a single bill. The 5% markup is irrelevant at solo-founder volume and you save days of provider account setup.
Yes, if you're a team of 3+ with real governance needs (spend caps, prompt versioning, audit logs). No, if you're a solo founder; the free tier is enough until you cross 10k requests/month, and OpenRouter is simpler at that scale.
Yes. LiteLLM is the most mature OSS option; you run it as a Docker container and it proxies to any provider. Portkey and Helicone also offer self-hostable open-source cores. The trade-off is ops burden: you operate it, scale it, and monitor it.
Vercel AI Gateway is free if you host on Vercel, supports fewer models, and ties you to the Vercel AI SDK. OpenRouter costs 5% extra, supports 300+ models, works on any platform, and has no SDK lock-in. Pick Vercel if you're a Next.js shop on Vercel; pick OpenRouter otherwise.
Yes, but usually 20-80 milliseconds on top of the provider call. Caching cuts that to near-zero for repeated prompts. Self-hosted gateways like LiteLLM in the same VPC as your app add closer to 5-15ms. For most apps, the gateway latency is invisible against a 1-3 second model response.
Vercel AI Gateway if you're on Vercel (free). Helicone Free (10k req/mo) if you want observability. OpenRouter pay-per-token if you just want to ship. All three get you running in under an hour with no monthly commitment.
Senior automation engineer at withRemote. Writes on CI/CD, test pyramids, and removing toil from engineering pipelines.