
To handle long running tasks on Vercel in 2026, enable Fluid Compute first. That covers any job under 13 minutes on Pro or Enterprise. For longer work, switch to Vercel Workflows (no time limit) or push the job onto an external worker (Inngest, Trigger.dev, a Railway worker, or a self-hosted queue). Pick the cheapest, simplest option that fits the longest job you actually run.
The rest of this post is the decision tree, the real cost numbers, the failure modes nobody warns you about, and the code you ship on Monday.
Three things changed in the last two years.
First, Vercel rolled out Fluid Compute and quietly raised the function ceiling. Hobby plans get 300 seconds (5 minutes), Pro and Enterprise get up to 800 seconds (about 13 minutes) when Fluid Compute is enabled. That alone killed off most of the "background job" startup advice from 2023 and 2024.
Second, Vercel shipped Workflows. Code can pause, resume, and keep state for minutes, hours, or months. It is the official answer for anything that exceeds the function ceiling, and it removes the most common reason teams used to bolt on Inngest or Temporal.
Third, AI features changed the workload shape. Long Claude or GPT generations, multi-step agent runs, video transcoding for AI summaries, and overnight embedding rebuilds all push past the old serverless limits. A SaaS that did zero background work in 2023 probably has four jobs longer than 30 seconds today.
Most teams discover this problem in the worst possible way. A user clicks "Generate report," the route hits a 60-second timeout, the user retries, the timeout fires again, and a Stripe webhook handler nearby starts failing because the same function pool is exhausted.
The instinct is to bump maxDuration to the ceiling and move on. That works for a week. It breaks the moment you process anything user-uploaded (PDFs, video, datasets), call any third-party API with variable latency, or chain more than two LLM calls together.
Bumping the timeout also makes the failure invisible. A 12-minute function that dies at minute 11 looks identical to a network blip in your logs. You only notice when support tickets pile up.
The fix is not a bigger ceiling. It is matching the pattern to the job duration.
Here is the rule. Look at your single longest job, in seconds, and pick the row that covers it with room to spare.
| Longest job | Pattern | Real cost | Tooling |
|---|---|---|---|
| Under 1s | Regular function | Included in plan | Nothing extra |
| 1s to 60s | Fluid Compute, default | Pro plan ($20/user/mo) | Toggle in dashboard |
| 60s to 13 min | Fluid Compute + raised maxDuration | Pro plan + GB-hour billing | export const maxDuration = 800 |
| 13 min to several hours | Vercel Workflows | Workflow execution pricing | import { workflow } from '@vercel/workflows' |
| Hours to months, or stateful | Workflows OR external worker | Workflows or $5-50/mo VM | Workflows, Inngest, Trigger.dev, Railway worker |
| Recurring cron under 13 min | Vercel Cron | Included on Pro | vercel.json schedule |
The mistake is starting at the bottom. Teams reach for Inngest or a Railway worker on day one because they read a 2023 blog post. Most of them never run a job longer than 4 minutes and would have been fine with Fluid Compute alone.
Open your Vercel logs for the last 30 days. Sort by function duration descending. Look at the 99th percentile, not the average. If your p99 is 18 seconds, you do not have a long-running-task problem. You have a slow-database-query problem. Fix that first.
If your p99 is genuinely above 60 seconds, write down the three slowest functions and what they do. That list drives every decision below.
In the Vercel dashboard, go to your project, Settings, Functions, and toggle Fluid Compute on. It is free to enable. Billing changes to GB-hour metering, which usually costs less for spiky workloads and slightly more for constant load. Read the pricing page before you ship if your function fleet is large.
Fluid Compute also gives you waitUntil (Edge runtime) and Next.js after() (App Router). Both let you return a response to the user immediately while a background task keeps running for the rest of the function lifetime. This is the single biggest win for "log the event, send the email, then respond fast" patterns.
import { after } from 'next/server';
export async function POST(req: Request) {
const body = await req.json();
const result = await saveOrder(body);
after(async () => {
await sendReceiptEmail(result.id);
await pingAnalytics(result.id);
});
return Response.json({ ok: true, id: result.id });
}
The user sees the response in 80ms. The two follow-ups run for up to the function's maxDuration. Caveat: after() is not a queue. If the function instance crashes, the work is lost. For anything financially material, log the intent to a database first, then run the work, then mark it done.
Raise the ceiling on a per-route basis, not as a global default. The dumbest mistake is setting maxDuration: 800 on every API route. Now a buggy loop in your /api/health route can rack up GB-hours for 13 minutes before the platform kills it.
// app/api/generate-report/route.ts
export const maxDuration = 300;
Set a separate, much lower default in vercel.json for everything else. Sound engineering practices like this are part of a broader habit of managing technical debt deliberately rather than letting platform defaults pile up.
This is the part of the answer that did not exist in 2024. Workflows let your code suspend at a step, persist state, and resume later. There is no upper duration limit. You write normal TypeScript with a few primitives.
import { workflow, step } from '@vercel/workflows';
export const generateAndShipReport = workflow(async (ctx, input: { userId: string }) => {
const rows = await step('fetch-rows', () => loadRows(input.userId));
const enriched = await step('enrich', () => callLLM(rows));
await step.sleep('wait-for-approval', '24h');
const approval = await step('check-approval', () => getApproval(input.userId));
if (!approval) return;
await step('email', () => sendReport(input.userId, enriched));
});
The step.sleep('24h') is the giveaway. The function is not running for 24 hours. The workflow is suspended, the compute is released, and it wakes back up to run the next step. You are billed for actual execution time, not wall-clock time.
Use Workflows when your job has natural pauses (waiting for a webhook, a human approval, a scheduled time, an upstream batch). They are overkill for a job that runs straight through in 8 minutes; Fluid Compute is cheaper there.
There is still a class of work that belongs off Vercel. Anything that needs a long-lived connection (a websocket fanout server, a persistent agent loop), anything that pegs CPU for hours (video transcode, ML training), or anything that needs more than 4 GB of memory.
For these, run a small worker on Railway, Fly.io, Render, or a $20/month VPS. Have Vercel enqueue work onto Upstash QStash, AWS SQS, or Redis, and let the worker pick it up. Trigger.dev and Inngest are the managed flavors of this same pattern if you do not want to operate the worker yourself.
Honest comparison: Inngest and Trigger.dev are excellent. They have nicer observability than Workflows did on launch and richer step primitives. If you already use them, you do not need to migrate. The case for Workflows is that you avoid a second vendor, a second billing relationship, and one more set of env vars.
Long-running jobs fail silently more than any other code in your stack. Before you put any of this in production, send durations and outcomes to a place a human will see.
The minimum: log every job start and end with a correlation ID. Send failure events to Sentry. Track p50 and p99 duration in a dashboard. If you use Workflows, the Vercel dashboard shows step-level traces for free.
For more rigorous coverage, run integration tests in CI against the real queue and the real database. Mocked tests will pass forever while production jobs silently die. Pair that with the right E2E testing setup for a SaaS so user-visible flows that trigger background work are covered end to end.
A few patterns look right on paper and bite in production.
Fire-and-forget without persistence. Calling fetch('/api/work') without awaiting it works in development. In production, the parent function returns, the platform tears down the instance, and your "background" fetch is killed mid-flight 30% of the time. Always persist the job intent before you trigger work.
Cron jobs that overlap. A 12-minute job scheduled every 10 minutes will eventually have two copies running at once. Use a database lock (Postgres pg_try_advisory_lock is one line) or a Redis SET NX with TTL.
waitUntil for billing-critical work. It is fire-and-forget by another name. If the email must send, write to a queue or an outbox table.
Bumping maxDuration globally. Per-route is the right granularity. Many small functions sharing one long ceiling is a cost grenade.
No idempotency keys. Long jobs get retried. Without an idempotency key on writes, you will charge customers twice or send three Slack notifications. Stripe-style request keys on every write are non-negotiable; the same discipline applies if you handle Stripe webhooks correctly.
If you are two founders pre-revenue and your longest function runs in 4 seconds, do nothing. Toggle Fluid Compute on, set a sane default maxDuration of 60 in vercel.json, and ship the product. Premature background-job infrastructure has killed more startups than 30-second timeouts have.
You only need the rest of this post once you have a job that genuinely runs longer than 60 seconds and a user who is waiting.
Pick one of three paths based on the longest job you actually run.
If your p99 is under 13 minutes, enable Fluid Compute, set maxDuration per route, and stop. You are done for the next year.
If you have one job over 13 minutes or one job with natural pauses, prototype it in Vercel Workflows this week. You can stand up a working flow in a few hours.
If you have multiple long-running jobs, an in-house ML pipeline, or stateful agents, you probably want a dedicated worker. A Cadence senior engineer ($1,500/week) can scope the queue topology, write the worker, set up observability, and hand back a working system inside a single weekly engagement. Every engineer on Cadence is AI-native by default; vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings, so the boilerplate (queue clients, retry wrappers, Sentry hooks) goes faster than it would for a generalist contractor.
If you would rather pressure-test the architecture before you build anything, run your stack through Ship or Skip for a candid grade on what to keep, refactor, or rip out.
maxDuration. Add vercel.json with a 60s default. Raise the ceiling per route only where measured data demands it.after() or waitUntil for post-response work. Move email sends, analytics pings, and audit logs out of the critical path.Booking a senior engineer for one week is faster than reading three more Inngest tutorials. Cadence places a vetted, AI-native engineer on your stack inside 48 hours, with a free trial and weekly billing. Audit your background-job stack before you commit to a redesign.
With Fluid Compute enabled, Hobby plans run up to 300 seconds (5 minutes). Pro and Enterprise run up to 800 seconds (about 13 minutes). For anything longer, use Vercel Workflows, which has no upper duration limit.
Fluid Compute keeps instances warm longer, runs multiple invocations on the same instance to cut cold starts, and bills in GB-hours rather than per-invocation. It also unlocks waitUntil and Next.js after() for post-response background work. Toggle it on in your project's Function settings.
Workflows is the right default when you already deploy on Vercel and want one vendor. Inngest and Trigger.dev have more mature step primitives and dashboards if you need richer fan-out, replay, or cross-project orchestration. None is strictly better. Match the tool to the team's existing surface area.
You can on Pro with Fluid Compute, but you should not. A 5-minute query usually means a missing index, an N+1, or a job that belongs in the database itself as a materialized view or pg_cron task. Fix the query before you architect around it.
No. Persist the job intent to a row in Postgres, run it inside after() or a Workflow, and mark the row done. You graduate to a real queue (Upstash QStash, SQS, BullMQ) when you have multiple job types, fan-out, or a worker that needs to scale independently of your web app.
Vercel Workflows. Compute is only billed during active step execution, not the wall clock, so a 30-minute job with a long sleep inside might cost cents. The runner-up is a $5/month Railway worker plus Upstash QStash, which works fine and gives you a process you can SSH into.