I am a...
Learn more
How it worksPricingFAQ
Account
May 14, 2026 · 10 min read · Cadence Editorial

Cost to add image generation to your app

cost to add image generation — Cost to add image generation to your app
Photo by [Markus Winkler](https://www.pexels.com/@markus-winkler-1430818) on [Pexels](https://www.pexels.com/photo/scrabble-tiles-on-a-wooden-table-with-the-word-rock-19867470/)

Cost to add image generation to your app

Adding image generation to your app in 2026 typically costs $1,500 to $10,000 in one-time engineering, plus $0.003 to $0.20 per image at runtime. A basic Replicate or fal wire-up takes 1 to 2 weeks. A production pipeline with moderation, storage, CDN, and retry logic takes 3 to 5 weeks. Vendor choice (DALL-E, Imagen, FLUX, Stability) drives the per-image bill more than anything else.

The rest of this post breaks down where that money actually goes, so you can size the build for your use case before you spend a dollar.

What you actually pay for (it is not just the API)

Founders ask "what does the API cost?" and budget for that. Then the bill arrives and it is three to ten times bigger than the model spend. Here is what is actually on the invoice:

  • One-time engineering build. Wiring the API, the upload UI, the queue, the moderation pipeline, the error handling, and the analytics. Single biggest line item in months 1-2.
  • Per-image vendor cost. $0.003 (FLUX Schnell) to $0.20 (Midjourney top tier). The 60x spread matters once you cross 1,000 images per day.
  • Storage. Every generated image you keep lives somewhere and pays rent every month.
  • CDN egress. Reads of those images. Almost always larger than storage itself.
  • Moderation and safety review. NSFW classification, prompt filtering, plus a human review queue for edge cases.
  • Failed generations. Roughly 20 to 30 percent of generations get rejected for quality, prompt-safety blocks, or text-rendering bugs. You pay for them anyway.

If you only model the API line, your forecast is wrong by a factor of three.

Engineering integration cost: 1 to 2 weeks basic, 3 to 5 weeks production

The engineering bill depends on what you are building. There are two real shapes.

Basic build (1 to 2 weeks)

A single vendor SDK, a "generate" button, an S3 or R2 upload, a result preview, and a download. Use Replicate or fal for the model so you do not run a GPU. Use the vendor's default safety filters. No human review. Good enough for an internal tool, a prototype, or a side feature inside a B2B product.

A Mid engineer ($1,000/week on Cadence) ships this in 1 to 2 weeks. Total build cost: $1,000 to $2,000.

Production build (3 to 5 weeks)

Vendor failover (FLUX as the cheap default, fall back to DALL-E if FLUX errors), a per-user generation queue (so one user does not starve everyone else), prompt sanitization, a NSFW classifier (AWS Rekognition or Hive or Sightengine), a copyright-friendly prompt filter, retries with exponential backoff, hot-path storage on R2, a CDN, an admin dashboard for moderation, an audit log, and observability so you know when a vendor is down.

A Senior engineer ($1,500/week) plus part-time Mid handles this in 3 to 5 weeks. Total build cost: $4,500 to $10,000.

This is the same architecture pattern as a Claude API integration: the model is commodity, the pipeline around it is the actual product.

Per-image vendor cost in 2026

The image-gen field has consolidated into three pricing tiers. We are quoting list price per image in May 2026; check vendor sites since they re-price every quarter.

VendorModelPrice per imageBest for
OpenAIDALL-E 3 / gpt-image-1$0.04 to $0.12Photo + text-in-image, broad commercial use
OpenAISora image features$0.04 to $0.20High-fidelity photoreal + scene consistency
GoogleImagen 3$0.02 to $0.04General photo, cheapest premium-tier
GoogleImagen 4$0.04 to $0.06Strongest text rendering on a budget
GoogleImagen 4 Fast$0.02High-volume "good enough" path
MidjourneyAPI$0.04 to $0.20Illustration, brand, stylized
Stability AISDXL / SD 3.5$0.005 to $0.02Cheap product shots, thumbnails
Replicatehosted (any)$0.001 to $0.05Mix-and-match, pay per second of GPU
FLUX (via Replicate / Together / fal)Schnell to Pro$0.003 to $0.05Best quality-to-price in 2026

A few practical notes:

  • DALL-E 3 / gpt-image-1 is the safe default for a B2C app where users will post results to social. Strongest text-in-image. Most expensive non-Midjourney option.
  • Imagen 4 Fast is the sleeper pick. $0.02 per image, ships through Vertex, image quality matches DALL-E 3 standard for most use cases.
  • FLUX Pro at $0.05 matches DALL-E 3 quality for product photography and beats it on speed. FLUX Schnell at $0.003 is the cheapest "looks fine" model on the market.
  • Stability SDXL at half a cent is fine for thumbnails or low-bar use cases. It is not fine for hero images or anything that ships to a paying user without a human review.
  • Midjourney is API-restricted and the commercial terms are tighter than OpenAI or Google. Read the license before you wire it into a paid product.

Hidden infrastructure costs: storage, CDN, moderation

This is where most "we are way over budget" conversations start.

Storage and CDN

Every image you keep costs storage every month. Every read costs egress. The egress bill is the killer.

StackStorage ($/GB/mo)Egress ($/GB)Notes
AWS S3 + CloudFront$0.023$0.085 (first TB)Default for AWS shops; egress dominates
Cloudflare R2$0.015$0 (free)Best fit for read-heavy workloads
Backblaze B2 + Cloudflare$0.006$0 via Bandwidth AllianceCheapest cold-ish storage path

A real number: a media app at 200 TB stored and active read traffic was paying ~$47,000/month on S3 + CloudFront. Storage was $4,600 of that. Egress was $42,400. Migrating to R2 cut the bill by roughly 80 percent.

If your images are read more than a couple of times each (any social, marketing, or content product), default to R2.

Moderation

Even if your users only generate images for themselves, you still need moderation. NSFW prompts will happen. Branded-IP prompts will happen. You do not want to find out via a TechCrunch headline.

ToolPriceNotes
AWS Rekognition (Image Moderation)~$1.00 per 1,000 images (first 1M)Native if you are on AWS
Hive Moderation~$0.50 to $2.00 per 1,000Strongest NSFW + violence + hate accuracy
Sightengine~$0.40 to $1.50 per 1,000Good price, decent accuracy
OpenAI Moderation APIFree for prompt textUse it on the prompt before you generate

Run moderation on the prompt before you call the model (cheap, free in OpenAI's case), then on the result image after generation (catches what the prompt-filter missed). Budget $1 to $2 per 1,000 images for the image step.

Failed generations

Plan for 20 to 30 percent rejection rate. Either the model returns garbage, the safety filter trips, or the user reroll-spams. You pay for every API call. A naive cost forecast that assumes a 100% accept rate underestimates the bill by 25 to 40 percent.

Per-feature math: 100, 1,000, and 10,000 images per day

Here is the actual monthly bill at three real volumes, using a mid-tier model (Imagen 4 Fast at $0.02) and a cheap stack (FLUX Schnell at $0.003).

Daily volumeImagen 4 Fast ($0.02)FLUX Schnell ($0.003)Storage + CDN (R2)ModerationTotal (Imagen / FLUX)
100/day (~3k/mo)$60$9<$5$3$68 / $17
1,000/day (~30k/mo)$600$90$20$30$650 / $140
10,000/day (~300k/mo)$6,000$900$150$300$6,450 / $1,350

A few takeaways:

  • At 100/day, vendor choice does not matter. The whole feature is under $100/month. Pick the highest-quality model.
  • At 1,000/day, vendor choice is a $500/month decision. Worth A/B-testing FLUX Pro vs DALL-E 3 on your specific use case.
  • At 10,000/day, vendor choice is a $5,000/month decision. This is when you build vendor failover and route 80% of traffic to FLUX Schnell, 20% to DALL-E for the prompts that need it.

This is the same shape as the math in adding voice AI to your app and adding transcription: commodity at low volume, real engineering decision at scale.

Cost breakdown by approach (build, hire, book)

Same feature, five different ways to ship it. Build cost only (excludes per-image runtime).

ApproachCostTimelineProsCons
US full-time hire$8,000 to $15,000 (build only)4 to 6 weeksFull ownership, in-house knowledge$160k+ FTE for a one-time feature; 6-12 week hiring loop
Dev agency (US/EU)$15,000 to $40,0004 to 8 weeksPredictable scope, project-managedMarkup, slow ramp, generic team
Freelancer (Upwork)$1,000 to $5,0002 to 6 weeks (variable)Cheap, fast to startQuality lottery, weak on moderation and infra
Toptal$6,000 to $12,0001 to 3 weeks to start, 3 to 5 weeks to shipPre-vetted, English-fluentMonthly minimums, slower onboarding
Cadence$500 to $2,000/week48-hour trial, then 1 to 4 weeks to shipEvery engineer AI-native by default, weekly billing, replace any weekLess suited to enterprise procurement

A few honest notes:

  • If you already have an in-house Senior who knows your codebase, just have them build it. The integration is not exotic and a week of a Senior who already understands your auth and storage will beat any external option.
  • Toptal wins if you need a US-time-zone match with strong English and you do not mind a monthly minimum. The vetting is real.
  • Cadence wins when you want a build-only engagement. Every engineer on Cadence is AI-native, vetted on Cursor, Claude Code, and Copilot fluency through a voice interview before they unlock bookings. Founders book a Mid for $1,000/week, get a 48-hour free trial, and replace the engineer any week. We currently match against a 12,800-engineer pool, with a 27-hour median time to first commit.

If you are deciding between options, our build-vs-buy framework is the same one we use internally.

How to keep the bill down without shipping a worse product

Five things that move the needle:

  1. Pick the cheapest model that meets your quality bar. Run a 50-image bake-off across FLUX Schnell, Imagen 4 Fast, and DALL-E 3. Most B2B use cases are fine on the $0.003 tier; only ship the expensive model where the cheap one visibly fails.
  2. Cache aggressively. Hash the prompt; if you have generated it before, return the cached image. On product apps with templated prompts, caching cuts the bill by 20 to 40 percent.
  3. Moderate the prompt before you generate. OpenAI Moderation API is free and catches 80% of policy violations before you spend a cent on generation.
  4. Use R2, not S3 + CloudFront. Free egress is the single biggest infra-cost win in 2026 if you have read traffic.
  5. Bill heavy users. If 5% of users generate 80% of images, your pricing is wrong. Cap free-tier generations and meter the rest.

If you want a sanity-check on which line items to cut first, run the numbers on Cadence before you commit to a build budget.

The fastest path from idea to shipped feature

  1. Pick a hosted vendor. Replicate or fal for FLUX, OpenAI for DALL-E, Google Vertex for Imagen. Do not stand up your own GPU.
  2. Ship the basic build behind a feature flag. 1 to 2 weeks. Get real prompts from real users before you spend on production architecture.
  3. Decide based on real usage. If volume stays under 1,000/day, you are done. If it grows past that, book a Senior to harden the pipeline (moderation, failover, R2, observability) for 3 to 5 weeks.

If you do not already have a free engineer for that 3-to-5-week build, the fastest path is to skip the hiring loop. Cadence shortlists vetted engineers in 2 minutes with a 48-hour free trial. Pick Mid for the basic build, Senior for the production hardening.

Try Cadence free for 48 hours. Book a Mid engineer for the basic image-gen wire-up, or a Senior for a production pipeline with moderation, storage, and CDN. Weekly billing, no notice period, replace any week. See what it costs.

FAQ

How long does it take to add image generation to an app?

Plan 1 to 2 weeks for a basic single-vendor wire-up (one model, one storage path, default safety filters) and 3 to 5 weeks for a production pipeline with moderation, vendor failover, R2 storage, a CDN, retries, and an admin moderation queue.

What is the cheapest image generation API in 2026?

FLUX Schnell via Replicate, fal, or Together at around $0.003 per image, or Stability SDXL at $0.005 to $0.02. Quality is good for thumbnails and product shots but trails DALL-E and Imagen on text-in-image and complex compositions.

Do I need content moderation if my users only generate their own images?

Yes. Even closed user groups produce policy-violating prompts often enough that you need a NSFW classifier (Rekognition, Hive, Sightengine) plus a human review path. Run moderation on the prompt first (free with OpenAI Moderation API) then on the result image.

Should I store generated images in S3 or Cloudflare R2?

R2 if your users will load each image more than a few times. S3 + CloudFront egress is often 8 to 85 times more expensive than R2 at scale, and R2 has zero egress fees. Storage cost itself is a rounding error; egress is the bill.

Build, buy, or book: how do I decide?

Buy the model (no one builds their own image model in 2026; it is commodity infra). Build the pipeline if image generation is core to your product. Book on-demand engineering for the 3-to-5-week build phase if you do not already have a free Senior in-house.

All posts