
Adding image generation to your app in 2026 typically costs $1,500 to $10,000 in one-time engineering, plus $0.003 to $0.20 per image at runtime. A basic Replicate or fal wire-up takes 1 to 2 weeks. A production pipeline with moderation, storage, CDN, and retry logic takes 3 to 5 weeks. Vendor choice (DALL-E, Imagen, FLUX, Stability) drives the per-image bill more than anything else.
The rest of this post breaks down where that money actually goes, so you can size the build for your use case before you spend a dollar.
Founders ask "what does the API cost?" and budget for that. Then the bill arrives and it is three to ten times bigger than the model spend. Here is what is actually on the invoice:
If you only model the API line, your forecast is wrong by a factor of three.
The engineering bill depends on what you are building. There are two real shapes.
A single vendor SDK, a "generate" button, an S3 or R2 upload, a result preview, and a download. Use Replicate or fal for the model so you do not run a GPU. Use the vendor's default safety filters. No human review. Good enough for an internal tool, a prototype, or a side feature inside a B2B product.
A Mid engineer ($1,000/week on Cadence) ships this in 1 to 2 weeks. Total build cost: $1,000 to $2,000.
Vendor failover (FLUX as the cheap default, fall back to DALL-E if FLUX errors), a per-user generation queue (so one user does not starve everyone else), prompt sanitization, a NSFW classifier (AWS Rekognition or Hive or Sightengine), a copyright-friendly prompt filter, retries with exponential backoff, hot-path storage on R2, a CDN, an admin dashboard for moderation, an audit log, and observability so you know when a vendor is down.
A Senior engineer ($1,500/week) plus part-time Mid handles this in 3 to 5 weeks. Total build cost: $4,500 to $10,000.
This is the same architecture pattern as a Claude API integration: the model is commodity, the pipeline around it is the actual product.
The image-gen field has consolidated into three pricing tiers. We are quoting list price per image in May 2026; check vendor sites since they re-price every quarter.
| Vendor | Model | Price per image | Best for |
|---|---|---|---|
| OpenAI | DALL-E 3 / gpt-image-1 | $0.04 to $0.12 | Photo + text-in-image, broad commercial use |
| OpenAI | Sora image features | $0.04 to $0.20 | High-fidelity photoreal + scene consistency |
| Imagen 3 | $0.02 to $0.04 | General photo, cheapest premium-tier | |
| Imagen 4 | $0.04 to $0.06 | Strongest text rendering on a budget | |
| Imagen 4 Fast | $0.02 | High-volume "good enough" path | |
| Midjourney | API | $0.04 to $0.20 | Illustration, brand, stylized |
| Stability AI | SDXL / SD 3.5 | $0.005 to $0.02 | Cheap product shots, thumbnails |
| Replicate | hosted (any) | $0.001 to $0.05 | Mix-and-match, pay per second of GPU |
| FLUX (via Replicate / Together / fal) | Schnell to Pro | $0.003 to $0.05 | Best quality-to-price in 2026 |
A few practical notes:
This is where most "we are way over budget" conversations start.
Every image you keep costs storage every month. Every read costs egress. The egress bill is the killer.
| Stack | Storage ($/GB/mo) | Egress ($/GB) | Notes |
|---|---|---|---|
| AWS S3 + CloudFront | $0.023 | $0.085 (first TB) | Default for AWS shops; egress dominates |
| Cloudflare R2 | $0.015 | $0 (free) | Best fit for read-heavy workloads |
| Backblaze B2 + Cloudflare | $0.006 | $0 via Bandwidth Alliance | Cheapest cold-ish storage path |
A real number: a media app at 200 TB stored and active read traffic was paying ~$47,000/month on S3 + CloudFront. Storage was $4,600 of that. Egress was $42,400. Migrating to R2 cut the bill by roughly 80 percent.
If your images are read more than a couple of times each (any social, marketing, or content product), default to R2.
Even if your users only generate images for themselves, you still need moderation. NSFW prompts will happen. Branded-IP prompts will happen. You do not want to find out via a TechCrunch headline.
| Tool | Price | Notes |
|---|---|---|
| AWS Rekognition (Image Moderation) | ~$1.00 per 1,000 images (first 1M) | Native if you are on AWS |
| Hive Moderation | ~$0.50 to $2.00 per 1,000 | Strongest NSFW + violence + hate accuracy |
| Sightengine | ~$0.40 to $1.50 per 1,000 | Good price, decent accuracy |
| OpenAI Moderation API | Free for prompt text | Use it on the prompt before you generate |
Run moderation on the prompt before you call the model (cheap, free in OpenAI's case), then on the result image after generation (catches what the prompt-filter missed). Budget $1 to $2 per 1,000 images for the image step.
Plan for 20 to 30 percent rejection rate. Either the model returns garbage, the safety filter trips, or the user reroll-spams. You pay for every API call. A naive cost forecast that assumes a 100% accept rate underestimates the bill by 25 to 40 percent.
Here is the actual monthly bill at three real volumes, using a mid-tier model (Imagen 4 Fast at $0.02) and a cheap stack (FLUX Schnell at $0.003).
| Daily volume | Imagen 4 Fast ($0.02) | FLUX Schnell ($0.003) | Storage + CDN (R2) | Moderation | Total (Imagen / FLUX) |
|---|---|---|---|---|---|
| 100/day (~3k/mo) | $60 | $9 | <$5 | $3 | $68 / $17 |
| 1,000/day (~30k/mo) | $600 | $90 | $20 | $30 | $650 / $140 |
| 10,000/day (~300k/mo) | $6,000 | $900 | $150 | $300 | $6,450 / $1,350 |
A few takeaways:
This is the same shape as the math in adding voice AI to your app and adding transcription: commodity at low volume, real engineering decision at scale.
Same feature, five different ways to ship it. Build cost only (excludes per-image runtime).
| Approach | Cost | Timeline | Pros | Cons |
|---|---|---|---|---|
| US full-time hire | $8,000 to $15,000 (build only) | 4 to 6 weeks | Full ownership, in-house knowledge | $160k+ FTE for a one-time feature; 6-12 week hiring loop |
| Dev agency (US/EU) | $15,000 to $40,000 | 4 to 8 weeks | Predictable scope, project-managed | Markup, slow ramp, generic team |
| Freelancer (Upwork) | $1,000 to $5,000 | 2 to 6 weeks (variable) | Cheap, fast to start | Quality lottery, weak on moderation and infra |
| Toptal | $6,000 to $12,000 | 1 to 3 weeks to start, 3 to 5 weeks to ship | Pre-vetted, English-fluent | Monthly minimums, slower onboarding |
| Cadence | $500 to $2,000/week | 48-hour trial, then 1 to 4 weeks to ship | Every engineer AI-native by default, weekly billing, replace any week | Less suited to enterprise procurement |
A few honest notes:
If you are deciding between options, our build-vs-buy framework is the same one we use internally.
Five things that move the needle:
If you want a sanity-check on which line items to cut first, run the numbers on Cadence before you commit to a build budget.
If you do not already have a free engineer for that 3-to-5-week build, the fastest path is to skip the hiring loop. Cadence shortlists vetted engineers in 2 minutes with a 48-hour free trial. Pick Mid for the basic build, Senior for the production hardening.
Try Cadence free for 48 hours. Book a Mid engineer for the basic image-gen wire-up, or a Senior for a production pipeline with moderation, storage, and CDN. Weekly billing, no notice period, replace any week. See what it costs.
Plan 1 to 2 weeks for a basic single-vendor wire-up (one model, one storage path, default safety filters) and 3 to 5 weeks for a production pipeline with moderation, vendor failover, R2 storage, a CDN, retries, and an admin moderation queue.
FLUX Schnell via Replicate, fal, or Together at around $0.003 per image, or Stability SDXL at $0.005 to $0.02. Quality is good for thumbnails and product shots but trails DALL-E and Imagen on text-in-image and complex compositions.
Yes. Even closed user groups produce policy-violating prompts often enough that you need a NSFW classifier (Rekognition, Hive, Sightengine) plus a human review path. Run moderation on the prompt first (free with OpenAI Moderation API) then on the result image.
R2 if your users will load each image more than a few times. S3 + CloudFront egress is often 8 to 85 times more expensive than R2 at scale, and R2 has zero egress fees. Storage cost itself is a rounding error; egress is the bill.
Buy the model (no one builds their own image model in 2026; it is commodity infra). Build the pipeline if image generation is core to your product. Book on-demand engineering for the 3-to-5-week build phase if you do not already have a free Senior in-house.