
Datadog is the best observability platform money can buy for a SaaS, and it will absolutely surprise you on the bill. If you run between 5 and 500 hosts, you should know exactly which line items inflate, when to stay, and when to leave for Grafana/Loki/Tempo or SigNoz before the next renewal.
This is a Datadog-only deep review for technical founders and engineering leads. For a narrower head-to-head, see Sentry vs Datadog. Here we look at the product as a whole: real 2026 prices, bills at three company sizes, the four cost shocks, and the trigger conditions for migrating off.
If you have 5 to 100 hosts and no dedicated SRE, Datadog is worth paying for. The developer experience is genuinely a category above New Relic, Grafana Cloud, and SigNoz, and the time it saves engineers will outpace the bill until you cross specific thresholds.
You should plan to leave when any one of three things happens: total spend crosses $20k a month and is growing 30%+ per quarter without product growth, custom metrics exceed 40% of the bill, or you have hired a real SRE who can run open-source observability properly. Coinbase famously got to a $65M annual Datadog bill in 2021 before pulling that lever. You want a written plan well before that.
Datadog is a unified observability SaaS that bundles APM, infrastructure metrics, log management, real user monitoring, synthetics, database monitoring, security signals, CI visibility, and about a dozen other products on a single agent and one UI. The pitch is the cross-product link: click a slow trace, jump to the host metrics, jump to the log lines from that host in the same time window, all in two clicks.
That correlation is the moat. Grafana with Mimir, Loki, and Tempo can do most of these jobs, but you wire them together yourself. New Relic is the closest analog and trails on UX. Honeycomb is sharper for tracing but narrower. SigNoz is a credible OSS contender if your team speaks OpenTelemetry fluently. Datadog is the everything store of observability.
Here is the actual SKU list as of mid-2026 on annual billing. Month-to-month is roughly 20% more on every line.
| Product | Price | Notes |
|---|---|---|
| Infrastructure Pro | $15/host/mo | 100 custom metrics included per host |
| Infrastructure Enterprise | $23/host/mo | Adds anomaly detection, SAML, more retention |
| APM | $31/host/mo | $35 APM Pro, $40 APM Enterprise |
| Log ingestion | $0.10/GB | You pay this even if you never index |
| Log indexing (3-day) | $1.06 / million events | Real spend lives here |
| Log indexing (15-day) | $1.70 / million events | |
| Log indexing (30-day) | $2.50 / million events | |
| Custom metrics | $5 / 100 metrics / mo | Distribution metrics 5x multiplier |
| RUM (Investigate) | $1.50 / 1k sessions | $0.15 Measure tier, $3 Investigate Plus |
| Synthetics API | $5 / 10k test runs | |
| Synthetics Browser | $12 / 1k test runs | |
| Database Monitoring | $70 / database host / mo |
A few things stand out. Infra at $15/host is genuinely cheap. APM at $31/host is roughly 2x infra, which surprises new buyers. Logs look cheap because everyone reads the $0.10/GB number and ignores indexing. Custom metrics, the line everyone underestimates, often becomes the largest single SKU on your invoice.
The single most useful exercise before signing a Datadog contract is modeling your bill at the next two stages of growth. Here is a realistic build for three SaaS sizes.
| Scale | Hosts | APM hosts | Logs / mo | Custom metrics | RUM sessions | Estimated monthly bill |
|---|---|---|---|---|---|---|
| Early SaaS | 5 | 5 | 20 GB | 100 | 10k | $300 to $500 |
| Series A SaaS | 50 | 50 | 500 GB | 5,000 | 500k | $9,000 to $18,000 |
| Scaleup | 500 | 500 | 10 TB | 100,000 | 10M | $200,000 to $600,000 |
The early SaaS bill is fine. You will spend more on Notion seats. The Series A bill attracts the CFO's attention, which is reasonable given the value. The scaleup bill is where engineering leaders dedicate people to "Datadog cost engineering" as an actual job, and where the math on building your own stack starts to work. The scaleup range is wide because log volume and custom-metric counts vary 5x between two 500-host companies depending on architecture.
Datadog bills do not climb linearly. They compound. Four mechanisms drive almost every "how did our bill double" story.
Per-host pricing is computed against the peak weekly host count in the month. Scale from 50 to 200 hosts for a Black Friday weekend and you pay APM on 200 hosts for the entire month. A team we spoke with saw their APM line jump from $1,550 to $6,200 because of a five-day load test. The reasonable response is to optimize for the billing model rather than for your architecture, which is a bad incentive.
A custom metric is a unique combination of metric name and tag values. Tag a request count by customer_id, region, and feature_flag and you have multiplied your custom-metric count by every distinct combination. At $5 per 100 metrics per month this gets out of hand quickly.
The trap that catches OpenTelemetry-first teams: every OTel metric Datadog receives is classified as custom by default, because it bypasses native integrations. You build a clean OTel pipeline because it is the right architecture, then your custom-metric line balloons. Several public reviews put custom metrics at up to 52% of total bill at scale. Distribution and histogram metrics carry a 5x multiplier on top of that.
Everyone reads "$0.10 per GB" and assumes logs are cheap. The actual spend is in indexing, priced per million events and per retention window. 30-day retention costs $2.50 per million events, so 500M log lines a month costs $1,250 to index plus ingest. The fix most teams stumble into is dual-shipping: send everything to S3 cheaply via Vector or Cribl, index only the high-value 5-10% in Datadog. This works, but now you operate a separate log pipeline you did not budget for.
Enable the AWS CloudWatch integration with defaults and Datadog pulls thousands of metrics into your custom bucket, most of which nobody reads. Same story with Kubernetes: kube-state-metrics emits long-tail cardinality nobody queries. Both inflate your bill silently until you audit usage attribution.
In Q1 2022, Datadog disclosed a single customer that owed roughly $65 million in fees on a multi-year contract. The Pragmatic Engineer later confirmed the customer was Coinbase, and the bill came from 4x year-over-year user growth with no cost discipline. Engineers were told to ship; observability spend was downstream of revenue.
When the crypto market turned, Coinbase started a serious migration to a Grafana plus Prometheus plus Clickhouse stack. They double-wrote everything for months. Datadog renegotiated aggressively, Coinbase stayed, and an engineer later said the right call was to stay because matching the developer experience in-house would have taken "tens of engineering years."
The lesson is not "Datadog is too expensive." It is to have a written migration plan before you need one. Pre-build dashboards in Grafana, run Prometheus in shadow mode, maintain the skill on the team. When your bill triples and the renewal call comes, you have negotiating power instead of panic.
The reason teams keep paying is that the product is genuinely better than the alternatives at five things.
Cross-product correlation. Click a slow trace, see the host's CPU spike in the same minute, see the deploy event from CI, see the log lines from that container, all without copy-pasting timestamps. Nothing else does this as cleanly.
Watchdog. The auto-anomaly detection catches things humans missed. Not every alert is useful, but enough are that disabling Watchdog tends to produce regret.
Service maps. Datadog draws the dependency graph you forgot to draw, automatically, from APM traces. New engineers onboard faster because they can see what calls what.
Integrations. 700+ pre-built integrations, most of which actually work. The first time you enable the Postgres or the Stripe integration and get useful dashboards in 90 seconds, you understand what you are paying for.
Log Patterns. Clusters thousands of similar log lines into one row with a templated message. Sounds small. Saves real hours during incidents.
The honest weaknesses, all of which compound the cost story:
Cost predictability is bad. Even with usage attribution and budgets, you cannot reliably forecast next quarter's bill, because user growth and feature growth both swing usage non-linearly.
OTel-first orgs pay a premium. If you committed to OpenTelemetry for portability, you pay for that portability twice on Datadog: once in the custom-metric bucket, once in the loss of native integration discounts.
High-cardinality custom metrics get expensive fast. Anything you want to slice by user, tenant, or feature flag is a billing risk.
Slow log queries on huge indexes. A 30-day index across 500M events takes long enough that engineers stop using it during incidents and grep S3 instead.
Lock-in. Datadog's monitor definitions, dashboard JSON, and query language are not portable. A migration is not just data, it's rewriting hundreds of dashboards.
Specific thresholds worth writing into a doc your team revisits each quarter:
The five archetypes where Datadog is the right choice in 2026:
The two archetypes where Datadog is the wrong choice: data-heavy ML platforms with massive log volume (the index bill alone will kill you), and multi-cloud OpenTelemetry-first orgs (you are paying for portability you do not get the benefit of).
Three concrete steps regardless of where you land on this:
If your team does not have someone to do this in a focused week, that is a booking. Every engineer on Cadence is AI-native by default (Cursor, Claude Code, and Copilot fluency vetted before they unlock the platform), and a Senior at $1,500/week typically delivers an observability cost audit, tagging cleanup, and a Vector-based log filter in 1-2 weeks. The pool is around 12,800 engineers; median time to first commit is 27 hours. If you want a Senior to audit your stack, the 48-hour trial is on us.
If your tooling stress is actually a hiring stress, our breakdown of the best AI coding tools for senior engineers is a better starting point than another vendor evaluation.
Try Cadence's free observability audit. Book a Senior engineer for a 48-hour trial. They will pull your last three Datadog invoices, identify the top three cost levers, and ship the first config change before the trial ends. Weekly billing after that, replace the engineer any week, no notice period. Start the audit.
Yes if you have 5 to 100 hosts and no dedicated SRE; the developer experience saves more engineer hours than the bill costs, and the cross-product correlation is genuinely better than the alternatives. No once a single SKU exceeds 40% of your bill, or your total spend grows past $20k a month without matching product growth.
Three compounding factors. Per-host pricing is computed on your peak weekly host count, so any traffic spike inflates the whole month. Custom metrics are billed at $5 per 100 per month with OpenTelemetry metrics counted as custom by default. And logs are dual-charged: $0.10 per GB to ingest plus a per-million-events index fee that varies by retention.
Infrastructure Pro at $15 per host per month on annual billing. There is a free tier limited to 5 hosts with 1-day retention, which is fine for hobby projects but not for production SaaS. APM Pro at $31 per host is the next paid tier most teams add.
Grafana Cloud, or self-hosted Grafana plus Loki plus Tempo, for cost-conscious teams that have someone to run the infra. SigNoz or Last9 if you are OpenTelemetry-first. New Relic for a similar product with simpler per-user pricing. Honeycomb if your problem is tracing, not metrics.
When custom metrics or logs each exceed 40% of your bill, when total spend doubles in two quarters without matching product growth, or when you have hired a dedicated SRE who can run open-source observability properly. Have a written migration plan before any of these trigger, not after.