May 7, 2026 · 11 min read · Cadence Editorial

Datadog review for SaaS observability

datadog review saas — Datadog review for SaaS observability
Photo by [Brett Sayles](https://www.pexels.com/@brett-sayles) on [Pexels](https://www.pexels.com/photo/black-hardwares-on-data-server-room-4597280/)

Datadog review for SaaS observability

Datadog is the best observability platform money can buy for a SaaS, and it will absolutely surprise you on the bill. If you run between 5 and 500 hosts, you should know exactly which line items inflate, when to stay, and when to leave for Grafana/Loki/Tempo or SigNoz before the next renewal.

This is a Datadog-only deep review for technical founders and engineering leads. For a narrower head-to-head, see Sentry vs Datadog. Here we look at the product as a whole: real 2026 prices, bills at three company sizes, the four cost shocks, and the trigger conditions for migrating off.

The verdict in 120 words

If you have 5 to 100 hosts and no dedicated SRE, Datadog is worth paying for. The developer experience is genuinely a category above New Relic, Grafana Cloud, and SigNoz, and the time it saves engineers will outpace the bill until you cross specific thresholds.

You should plan to leave when any one of three things happens: total spend crosses $20k a month and is growing 30%+ per quarter without product growth, custom metrics exceed 40% of the bill, or you have hired a real SRE who can run open-source observability properly. Coinbase famously got to a $65M annual Datadog bill in 2021 before pulling that lever. You want a written plan well before that.

What Datadog actually is in 2026

Datadog is a unified observability SaaS that bundles APM, infrastructure metrics, log management, real user monitoring, synthetics, database monitoring, security signals, CI visibility, and about a dozen other products on a single agent and one UI. The pitch is the cross-product link: click a slow trace, jump to the host metrics, jump to the log lines from that host in the same time window, all in two clicks.

That correlation is the moat. Grafana with Mimir, Loki, and Tempo can do most of these jobs, but you wire them together yourself. New Relic is the closest analog and trails on UX. Honeycomb is sharper for tracing but narrower. SigNoz is a credible OSS contender if your team speaks OpenTelemetry fluently. Datadog is the everything store of observability.

2026 pricing, line by line

Here is the actual SKU list as of mid-2026 on annual billing. Month-to-month is roughly 20% more on every line.

ProductPriceNotes
Infrastructure Pro$15/host/mo100 custom metrics included per host
Infrastructure Enterprise$23/host/moAdds anomaly detection, SAML, more retention
APM$31/host/mo$35 APM Pro, $40 APM Enterprise
Log ingestion$0.10/GBYou pay this even if you never index
Log indexing (3-day)$1.06 / million eventsReal spend lives here
Log indexing (15-day)$1.70 / million events
Log indexing (30-day)$2.50 / million events
Custom metrics$5 / 100 metrics / moDistribution metrics 5x multiplier
RUM (Investigate)$1.50 / 1k sessions$0.15 Measure tier, $3 Investigate Plus
Synthetics API$5 / 10k test runs
Synthetics Browser$12 / 1k test runs
Database Monitoring$70 / database host / mo

A few things stand out. Infra at $15/host is genuinely cheap. APM at $31/host is roughly 2x infra, which surprises new buyers. Logs look cheap because everyone reads the $0.10/GB number and ignores indexing. Custom metrics, the line everyone underestimates, often becomes the largest single SKU on your invoice.

Cost at 5, 50, and 500 hosts

The single most useful exercise before signing a Datadog contract is modeling your bill at the next two stages of growth. Here is a realistic build for three SaaS sizes.

ScaleHostsAPM hostsLogs / moCustom metricsRUM sessionsEstimated monthly bill
Early SaaS5520 GB10010k$300 to $500
Series A SaaS5050500 GB5,000500k$9,000 to $18,000
Scaleup50050010 TB100,00010M$200,000 to $600,000

The early SaaS bill is fine. You will spend more on Notion seats. The Series A bill attracts the CFO's attention, which is reasonable given the value. The scaleup bill is where engineering leaders dedicate people to "Datadog cost engineering" as an actual job, and where the math on building your own stack starts to work. The scaleup range is wide because log volume and custom-metric counts vary 5x between two 500-host companies depending on architecture.

Why bills compound: the four cost shocks

Datadog bills do not climb linearly. They compound. Four mechanisms drive almost every "how did our bill double" story.

High-water-mark billing on hosts

Per-host pricing is computed against the peak weekly host count in the month. Scale from 50 to 200 hosts for a Black Friday weekend and you pay APM on 200 hosts for the entire month. A team we spoke with saw their APM line jump from $1,550 to $6,200 because of a five-day load test. The reasonable response is to optimize for the billing model rather than for your architecture, which is a bad incentive.

Custom metrics and the OpenTelemetry trap

A custom metric is a unique combination of metric name and tag values. Tag a request count by customer_id, region, and feature_flag and you have multiplied your custom-metric count by every distinct combination. At $5 per 100 metrics per month this gets out of hand quickly.

The trap that catches OpenTelemetry-first teams: every OTel metric Datadog receives is classified as custom by default, because it bypasses native integrations. You build a clean OTel pipeline because it is the right architecture, then your custom-metric line balloons. Several public reviews put custom metrics at up to 52% of total bill at scale. Distribution and histogram metrics carry a 5x multiplier on top of that.

Log indexing, not log ingestion

Everyone reads "$0.10 per GB" and assumes logs are cheap. The actual spend is in indexing, priced per million events and per retention window. 30-day retention costs $2.50 per million events, so 500M log lines a month costs $1,250 to index plus ingest. The fix most teams stumble into is dual-shipping: send everything to S3 cheaply via Vector or Cribl, index only the high-value 5-10% in Datadog. This works, but now you operate a separate log pipeline you did not budget for.

Cloudwatch and Kubernetes accidental cardinality

Enable the AWS CloudWatch integration with defaults and Datadog pulls thousands of metrics into your custom bucket, most of which nobody reads. Same story with Kubernetes: kube-state-metrics emits long-tail cardinality nobody queries. Both inflate your bill silently until you audit usage attribution.

The Coinbase $65M lesson for normal SaaS

In Q1 2022, Datadog disclosed a single customer that owed roughly $65 million in fees on a multi-year contract. The Pragmatic Engineer later confirmed the customer was Coinbase, and the bill came from 4x year-over-year user growth with no cost discipline. Engineers were told to ship; observability spend was downstream of revenue.

When the crypto market turned, Coinbase started a serious migration to a Grafana plus Prometheus plus Clickhouse stack. They double-wrote everything for months. Datadog renegotiated aggressively, Coinbase stayed, and an engineer later said the right call was to stay because matching the developer experience in-house would have taken "tens of engineering years."

The lesson is not "Datadog is too expensive." It is to have a written migration plan before you need one. Pre-build dashboards in Grafana, run Prometheus in shadow mode, maintain the skill on the team. When your bill triples and the renewal call comes, you have negotiating power instead of panic.

Where Datadog wins

The reason teams keep paying is that the product is genuinely better than the alternatives at five things.

Cross-product correlation. Click a slow trace, see the host's CPU spike in the same minute, see the deploy event from CI, see the log lines from that container, all without copy-pasting timestamps. Nothing else does this as cleanly.

Watchdog. The auto-anomaly detection catches things humans missed. Not every alert is useful, but enough are that disabling Watchdog tends to produce regret.

Service maps. Datadog draws the dependency graph you forgot to draw, automatically, from APM traces. New engineers onboard faster because they can see what calls what.

Integrations. 700+ pre-built integrations, most of which actually work. The first time you enable the Postgres or the Stripe integration and get useful dashboards in 90 seconds, you understand what you are paying for.

Log Patterns. Clusters thousands of similar log lines into one row with a templated message. Sounds small. Saves real hours during incidents.

Where Datadog loses

The honest weaknesses, all of which compound the cost story:

Cost predictability is bad. Even with usage attribution and budgets, you cannot reliably forecast next quarter's bill, because user growth and feature growth both swing usage non-linearly.

OTel-first orgs pay a premium. If you committed to OpenTelemetry for portability, you pay for that portability twice on Datadog: once in the custom-metric bucket, once in the loss of native integration discounts.

High-cardinality custom metrics get expensive fast. Anything you want to slice by user, tenant, or feature flag is a billing risk.

Slow log queries on huge indexes. A 30-day index across 500M events takes long enough that engineers stop using it during incidents and grep S3 instead.

Lock-in. Datadog's monitor definitions, dashboard JSON, and query language are not portable. A migration is not just data, it's rewriting hundreds of dashboards.

When to leave: trigger conditions

Specific thresholds worth writing into a doc your team revisits each quarter:

  • Total spend > $20k/mo and growing 30%+ per quarter without product growth. Start evaluating Grafana Cloud or self-hosted Grafana plus Loki plus Tempo. Budget 1-2 senior engineers for 6-12 weeks for a 50-host SaaS migration.
  • Custom metrics > 40% of bill. SigNoz or Last9 are OTel-native and price custom metrics differently. Both will quote you.
  • Logs > 50% of bill. Move logs to S3 with Quickwit, Parseable, or ClickHouse. Index only the hot 5-10% in Datadog (or move off entirely).
  • Single product (only logs, only APM). A specialist almost always beats the bundled price. BetterStack for logs. Honeycomb for tracing.
  • You hired a real SRE. The economics of self-hosted Grafana change the moment one full-time person can own the stack.

Who should buy Datadog

The five archetypes where Datadog is the right choice in 2026:

  1. A 5 to 100 host SaaS where developer velocity matters more than $500-2k/mo of savings.
  2. Teams without a dedicated SRE or platform engineer.
  3. Compliance-driven SaaS that needs SOC2 evidence yesterday and does not have time to integrate four open-source tools.
  4. Teams that already speak Datadog from a previous job and would otherwise lose months retraining.
  5. Anyone shipping fast enough that 6-12 weeks of migration engineering is more expensive than two more years of paying.

The two archetypes where Datadog is the wrong choice: data-heavy ML platforms with massive log volume (the index bill alone will kill you), and multi-cloud OpenTelemetry-first orgs (you are paying for portability you do not get the benefit of).

What to do this week

Three concrete steps regardless of where you land on this:

  1. Pull your last three Datadog invoices. Categorize spend by SKU. If one line item is more than 30% of the bill, that is your renegotiation lever and your migration target.
  2. Open the Usage Attribution page in Datadog and identify the top 10 services or teams driving spend. Often a single misconfigured agent is the culprit.
  3. Set a monthly bill ceiling and a renegotiation trigger price. Decide today, in writing, what number causes you to call your AE.

If your team does not have someone to do this in a focused week, that is a booking. Every engineer on Cadence is AI-native by default (Cursor, Claude Code, and Copilot fluency vetted before they unlock the platform), and a Senior at $1,500/week typically delivers an observability cost audit, tagging cleanup, and a Vector-based log filter in 1-2 weeks. The pool is around 12,800 engineers; median time to first commit is 27 hours. If you want a Senior to audit your stack, the 48-hour trial is on us.

If your tooling stress is actually a hiring stress, our breakdown of the best AI coding tools for senior engineers is a better starting point than another vendor evaluation.

Try Cadence's free observability audit. Book a Senior engineer for a 48-hour trial. They will pull your last three Datadog invoices, identify the top three cost levers, and ship the first config change before the trial ends. Weekly billing after that, replace the engineer any week, no notice period. Start the audit.

FAQ

Is Datadog worth the money for a SaaS?

Yes if you have 5 to 100 hosts and no dedicated SRE; the developer experience saves more engineer hours than the bill costs, and the cross-product correlation is genuinely better than the alternatives. No once a single SKU exceeds 40% of your bill, or your total spend grows past $20k a month without matching product growth.

Why are Datadog bills so high?

Three compounding factors. Per-host pricing is computed on your peak weekly host count, so any traffic spike inflates the whole month. Custom metrics are billed at $5 per 100 per month with OpenTelemetry metrics counted as custom by default. And logs are dual-charged: $0.10 per GB to ingest plus a per-million-events index fee that varies by retention.

What is the cheapest Datadog plan?

Infrastructure Pro at $15 per host per month on annual billing. There is a free tier limited to 5 hosts with 1-day retention, which is fine for hobby projects but not for production SaaS. APM Pro at $31 per host is the next paid tier most teams add.

What are the best alternatives to Datadog?

Grafana Cloud, or self-hosted Grafana plus Loki plus Tempo, for cost-conscious teams that have someone to run the infra. SigNoz or Last9 if you are OpenTelemetry-first. New Relic for a similar product with simpler per-user pricing. Honeycomb if your problem is tracing, not metrics.

When should I leave Datadog?

When custom metrics or logs each exceed 40% of your bill, when total spend doubles in two quarters without matching product growth, or when you have hired a dedicated SRE who can run open-source observability properly. Have a written migration plan before any of these trigger, not after.

All posts