
OpenAI function calling lets the model decide which of your functions to invoke and what arguments to pass, returning structured JSON that your code executes. To use it correctly in 2026: define tools with JSON schema, set strict: true, validate arguments before you run anything, support parallel calls, cap your agent loop at a fixed number of turns, and return errors back to the model as data instead of raising exceptions.
Most public guides still teach the 2023 API. Things have shifted: tools replaced functions, strict mode is the default for serious code, the Responses API exists, and parallel calls now work alongside strict. This post is the playbook a senior engineer writes after shipping three production agents on GPT-5.5: paired Python and TypeScript code, real cost math, and a decision rule for when not to use function calling at all.
Function calling does not execute anything. The model reads your prompt, decides a tool would help, and returns JSON describing which function to invoke and with what arguments. Your code parses that JSON, runs the function, and feeds the result back. The model then uses the result to write a response or request another tool.
If all you want is typed JSON output (a parsed invoice, a structured product list, a normalized address), skip function calling. Use Structured Outputs via response_format. It is cheaper, has no loop, and avoids the tool-result handshake.
Reach for function calling when the model needs to take actions with side effects: querying a database, hitting an external API, writing a file, sending a Slack message. Anything where the next model turn depends on a real-world result.
A tool definition has five fields that matter: type, name, description, parameters, and strict. Get any of them wrong and the model will misroute, fabricate arguments, or refuse to call the tool at all.
{
"type": "function",
"function": {
"name": "get_invoice",
"description": "Fetch a customer invoice by ID. Returns line items, total, and payment status.",
"parameters": {
"type": "object",
"properties": {
"invoice_id": {
"type": "string",
"description": "The invoice ID, e.g. 'inv_01HZAB12CD34'"
}
},
"required": ["invoice_id"],
"additionalProperties": false
},
"strict": true
}
}
Three rules earn their weight every project:
strict: true. Without it, the model can hallucinate arguments, skip required fields, or return invalid JSON. With it, OpenAI compiles your schema server-side and constrains decoding to match. First request takes 1 to 2 extra seconds while the schema compiles; every later request hits the cache.additionalProperties: false. Strict mode requires it. It also stops the model from inventing helpful extra fields you never asked for.required. For genuinely optional fields, use type: ["string", "null"] and let the model pass null.Descriptions are the single biggest lever on accuracy. "location" will get you "the user's house" half the time. "City and 2-letter US state, e.g. Seattle, WA" will get you "Seattle, WA" reliably. Adding a concrete example bumps argument accuracy by roughly 30%.
Tool definitions are sent on every request and count as input tokens. Ten well-described tools run about 1,500 tokens per turn before the user even speaks. A 5-turn agent loop is 7,500 tokens of pure tool overhead per session. Keep descriptions tight and tool count under 20.
Here is the smallest correct loop in Python using Chat Completions, with parallel-tool support, argument validation, and an error-as-data pattern.
import json
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_invoice",
"description": "Fetch a customer invoice by ID.",
"parameters": {
"type": "object",
"properties": {
"invoice_id": {"type": "string", "description": "Invoice ID, e.g. 'inv_01HZ...'"}
},
"required": ["invoice_id"],
"additionalProperties": False,
},
"strict": True,
},
}
]
def get_invoice(invoice_id: str) -> dict:
# your real DB call here
return {"id": invoice_id, "total_cents": 12000, "status": "paid"}
TOOL_REGISTRY = {"get_invoice": get_invoice}
def run_agent(user_input: str, max_turns: int = 10):
messages = [{"role": "user", "content": user_input}]
for turn in range(max_turns):
response = client.chat.completions.create(
model="gpt-5.5",
messages=messages,
tools=tools,
)
msg = response.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content
for call in msg.tool_calls:
name = call.function.name
try:
args = json.loads(call.function.arguments)
result = TOOL_REGISTRY[name](**args)
content = json.dumps(result)
except Exception as e:
content = json.dumps({"error": str(e), "tool": name})
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": content,
})
raise RuntimeError(f"Agent exceeded {max_turns} turns without finishing.")
The TypeScript version follows the same shape. Use the Zod helper for compile-time types on tool inputs.
import OpenAI from "openai";
import { z } from "zod";
import { zodFunction } from "openai/helpers/zod";
const client = new OpenAI();
const GetInvoiceArgs = z.object({
invoice_id: z.string().describe("Invoice ID, e.g. 'inv_01HZ...'"),
});
const tools = [zodFunction({
name: "get_invoice",
description: "Fetch a customer invoice by ID.",
parameters: GetInvoiceArgs,
})];
const registry = {
get_invoice: async ({ invoice_id }: { invoice_id: string }) =>
({ id: invoice_id, total_cents: 12000, status: "paid" }),
};
export async function runAgent(userInput: string, maxTurns = 10) {
const messages: any[] = [{ role: "user", content: userInput }];
for (let turn = 0; turn < maxTurns; turn++) {
const res = await client.chat.completions.create({ model: "gpt-5.5", messages, tools });
const msg = res.choices[0].message;
messages.push(msg);
if (!msg.tool_calls?.length) return msg.content;
const results = await Promise.all(msg.tool_calls.map(async (call) => {
try {
const args = GetInvoiceArgs.parse(JSON.parse(call.function.arguments));
const result = await registry[call.function.name as "get_invoice"](args);
return { id: call.id, content: JSON.stringify(result) };
} catch (e: any) {
return { id: call.id, content: JSON.stringify({ error: e.message }) };
}
}));
for (const r of results) messages.push({ role: "tool", tool_call_id: r.id, content: r.content });
}
throw new Error(`Agent exceeded ${maxTurns} turns.`);
}
Four production guarantees are baked in: strict-mode tools, JSON-parsing wrapped in try/catch, argument validation via Zod, and a hard turn cap. None are optional.
Recent models (gpt-4.1 onward, all gpt-5.x) can return multiple tool_calls in one assistant message. Ask "What is the weather and time in Tokyo, Paris, and SF?" and the model returns six tool calls at once. Run them concurrently.
In the Python version, swap the sequential loop for asyncio.gather. In TypeScript, Promise.all already does the work. A 4-call sequence that takes 8 seconds serially drops to about 2 seconds in parallel. That is the whole user-experience gap between "snappy" and "broken".
Two failure modes to watch:
tool_b needs tool_a's result, the model usually serializes across turns, but not always. Set parallel_tool_calls: false for those requests.A loop with no cap is a billing event waiting to happen. Cap at 8 to 12 turns for production agents. Log every turn with tool names and durations; you will need that data the first time something loops badly. The habit of logging the full message array to a side store so you can replay deterministically is part of what we call the verification habit, covered in what we mean by 'AI-native engineer'.
The single biggest mistake in production agents is letting tool errors propagate as exceptions. The model has no idea your database timed out. It sees its tool call vanish and retries the same broken call forever.
The fix is one line of discipline: wrap every tool execution in try/except and return the error as a JSON message.
try:
result = TOOL_REGISTRY[name](**args)
content = json.dumps(result)
except Exception as e:
content = json.dumps({"error": type(e).__name__, "message": str(e)[:500]})
Now the model sees {"error": "ConnectionTimeout", "message": "..."} and can retry, ask the user to clarify, or give up gracefully. This pattern eliminates roughly 80% of "agent stuck in a loop" tickets.
Three more habits worth adopting:
2^attempt seconds of delay, capped around 30s. The official SDKs do this if you set max_retries.The cost story the docs gloss over. Three numbers to memorize:
| What | Cost impact |
|---|---|
| Each tool definition | 100 to 200 input tokens per request, every turn |
| Strict-mode schema compile | One-time 1 to 2 second latency on first use, then cached |
| Tool result message | Whatever you put in content, counted as input next turn |
Two tactical wins. First, enable OpenAI's prompt caching when your tool list is stable; first request pays full price, later requests get a 50% discount on cached input tokens. Second, do not pass 20 tools every request. Either split into role-specific sub-agents (4 to 6 tools each) or use tool-search on gpt-5.4 and above, which lets the model query a tool index instead of receiving the full list upfront.
Function calling typically adds 15 to 30% to per-request token cost vs a vanilla chat completion. If you are sizing an AI budget, the cost to integrate OpenAI API into your app post breaks the math down end to end.
Three surfaces, three different times to use them.
| Approach | When to use | Pros | Cons |
|---|---|---|---|
| Chat Completions + tools | Simple agents, you control state | Stateless, easy to debug, works on every OpenAI model | You manage the message array manually |
| Responses API + tools | Multi-turn agents on gpt-5 and above | Reasoning items preserved, server-managed state | Newer surface, fewer third-party libs |
| Structured Outputs only | Typed JSON, no side effects | Simpler, cheaper, no loop | Cannot trigger actions |
| MCP server | Same tools across Claude, Cursor, ChatGPT | One implementation, many clients | Extra infra to host the server |
The Responses API is OpenAI's bet on stateful agents. It accepts an input array, returns an output array, and preserves the reasoning items that gpt-5 and above generate while thinking. If you are using a reasoning model and do not pass those items back next turn, you lose context and quality drops noticeably. Responses API does this for you; Chat Completions does not.
Use Chat Completions for a stateless, debuggable surface that works on every model from gpt-4o to gpt-5.5. Use Responses API for real multi-step agents on a reasoning model where you want the platform to handle state.
MCP (Model Context Protocol) sits a layer above. It is a standard way to expose tools as a server any client can talk to: Claude Desktop, Cursor, ChatGPT, your own app. If you want the same get_invoice tool from three different LLM clients, write it once as an MCP server instead of three separate function-call definitions.
Anthropic's tool use mirrors OpenAI almost field-for-field. Anthropic returns tool_use blocks instead of tool_calls, you reply with tool_result blocks instead of role:tool messages, and the field is input instead of arguments. The mental model is identical, which is why MCP bridges them cleanly.
If you are a founder or eng lead reviewing function-calling code that someone else wrote, here are the signals that separate a working agent from a ticking time bomb.
Green flags:
strict: true on every tool, additionalProperties: false, every param requiredasyncio.gather or Promise.all, not in a serial for-loopRed flags:
This is exactly the kind of code review every Cadence engineer is vetted on. The platform's voice interview specifically scores Cursor / Claude / Copilot fluency, prompt-as-spec discipline, and verification habits, and tool-calling competence shows up across all three. Every engineer on Cadence is AI-native by default; there is no opt-in tier, because there is no version of shipping a 2026 backend without these reflexes. We wrote about how the interview works in our voice-interview hiring deep-dive.
Three concrete steps if you have an existing OpenAI integration:
strict: true, additionalProperties: false, and the full required array. Test that nothing broke. Most code paths will not need changes; the model already obeys your schema, this just makes it guaranteed.If you are starting from scratch and you want a senior engineer to architect this for you, the fastest path is to skip the recruiter loop entirely and book a senior on Cadence for a week. Senior tier is $1,500 for the week, with a 48-hour free trial, and every senior on the platform has shipped production agents on top of either OpenAI or Anthropic. You can also pull a Build, Buy, or Book recommendation from our /tools/decide tool if you want a sanity check before committing engineering time.
Function calling is one of those APIs where the difference between toy demos and shipped product is about 200 lines of error handling and 5 hours of careful schema design. If you want a Cadence senior to do that work with you, book a 48-hour trial and have a working agent in your repo by Friday.
Use Structured Outputs when you only need typed JSON back from the model with no side effects, like parsing an email into structured fields. Use function calling when the model needs to trigger real actions (database writes, API calls, file I/O) and incorporate the results into its next response. Function calling is a strict superset, but it costs more tokens and adds a loop, so do not reach for it unnecessarily.
Yes. OpenAI shipped support for strict-mode parallel calls in mid-2025, and every model from gpt-5 onward supports the combination natively. Each tool call in the parallel batch independently adheres to its own schema. Fine-tuned models still have a small caveat where strict mode can disable across simultaneous calls; check the docs for your specific deployment.
Hard-cap the loop at 8 to 12 turns, log every iteration with tool names and timing, and break when the model returns a message with no tool_calls. Always assume the model can return zero, one, or many tool calls per turn, and write the loop accordingly. Add a per-tool timeout (5 to 30 seconds depending on the tool) so a single hung tool cannot freeze the whole agent.
Use Chat Completions if you need broad model support and stateless control, especially if you are still on gpt-4o or earlier. Use the Responses API on gpt-5 and above when you want reasoning items preserved across turns and server-managed state. The Responses API is the better default for any new agent you build today; Chat Completions remains the right pick for simple one-shot tool calls on older models.
Conceptually identical. You define a tool with a JSON schema, the model returns a tool_use block (Anthropic) or a tool_calls array (OpenAI), and you respond with a tool_result block (Anthropic) or a role: "tool" message (OpenAI). Field names differ (input vs arguments), but the agent-loop pattern is the same. MCP standardizes both so you can write one tool implementation and call it from either provider.