When you talk about a capable system, it is natural to describe what it does in verbs. It books flights. It sends emails. It updates spreadsheets. In Chapter 1, we saw how context makes it feel like you are talking to one continuous mind. In Chapter 2, we saw how memory makes it feel like that mind accumulates your history. Now a new illusion appears: this mind does not just remember—it acts. The puzzle is that nothing in what we have built so far can actually do anything: the model is a pure function that takes text as input and returns text as output, and memory lives in storage that only changes when your code writes to it. Between one response and the next, nothing moves unless some process you wrote moves it. Yet the systems you build will be judged almost entirely by what they cause: what gets emailed, saved, paid, deployed. At some point, text must cross the boundary from a description of an action to an action that actually happens. Where is that boundary? When a model outputs “send an email to alice@example.com,” who, precisely, is responsible for the SMTP handshake that follows? When a model writes perfect code that never runs, does it have any more agency than a static file on disk? The stakes are both conceptual and practical:Documentation Index
Fetch the complete documentation index at: https://docs.idyllic.so/llms.txt
Use this file to discover all available pages before exploring further.
- Conceptually, if you misplace where agency lives, you will misattribute responsibility. You will blame or trust the wrong component.
- Practically, if you treat agency as a mysterious property of the model, you will either build unsafe systems (by giving the model too much credit) or timid ones (by refusing to let anything real happen).
- How can a text generator cause effects in the world?
- What is “tool calling” really?
- Where does the “agent” begin and end?
- How do you control what actions are possible?
3.1 Text Generation and Real-World Effects
[DEMO: A chat box that accepts “Send ‘hello’ to Alice” and shows, side by side, (1) the model’s raw text output, and (2) a log of what the server actually did. Toggling a switch labeled “execute actions” turns real email sending on and off without changing the model’s output at all.] When you ask a model “send an email to alice@example.com,” what do you actually get back? You see a fluent reply: “I’ve sent the email.” You might see a nicely formatted email body. With function calling enabled, you might see structured JSON that looks like an API request. Text alone never causes irreversible effects; only your code does. The point where an email leaves your infrastructure is the specific line that calls the email API; remove that line and the model no longer sends email.The model has no agency. Agency is causality—the ability to cause effects in the world. The model produces text (data); your code interprets that text as instructions and executes them. Text by itself is inert data; interpreting and running it turns data into effects.
response.text; your code then decides whether to parse it and call sendEmail.
Your code parses the response and calls sendEmail using your credentials and infrastructure. The intent is still there in the output, but no email is sent because there’s no code left to call the email function.
You can treat the separation between model intent and code execution as a design boundary in your system.
Several implications follow from this separation:
First, an “agent demo” is always at least two things: the model producing an instruction and the system that executes it. The model’s reasoning chooses an action; the HTTP client, database calls, and email API execute those actions against external systems. Only the execution layer actually interacts with external services and data.
Second, you control the boundary. You decide which patterns of text count as “instructions,” how they are parsed, and what they are allowed to trigger. You can insist on rigid formats, or tolerate fuzzy ones. You can inject approvals, rate limits, and simulations in between. All of that lives in the execution layer.
Third, the most dangerous mistake in agent design is implicit execution. If arbitrary action-like text can trigger real effects, you have removed the boundary between model output and system execution.
The rest of the chapter is about making that boundary explicit, structured, and predictable.
3.2 Tool Calling Is Structured Output with Routing
[DEMO: Two panels. Left: a raw chat completion where the model returns “I will send an email…” in free text. Right: the same query with tool/function calling enabled, where the model returns JSON{ "tool": "sendEmail", "arguments": { ... } }. A third panel shows the TypeScript router that receives the JSON and invokes a mock sendEmail function.]
Intent and effect are distinct. The next pattern to consider is “tool calling,” “function calling,” or whatever your SDK calls it.
Suddenly the model’s output changes shape. Instead of chatty prose, you see structured JSON naming tools and passing arguments. It feels like the model has learned a new trick: it now “uses” tools.
Mechanistically, nothing about the model has changed: it still only emits text. The difference is that you now treat some of that text as structured JSON that your code parses and routes to specific functions, instead of free-form prose.
Tool calling is structured output with routing. The model generates JSON matching a schema; your code parses it and dispatches to functions. Tools constrain the action space (safer, more predictable); code execution expands it.
- Format. The model is asked to produce JSON with a known shape, not arbitrary prose you need to regex.
- Vocabulary. There is a closed set of tool names, each with a clear description.
- Router. There is exactly one place where model intent is translated into function calls.
- Predictable surface area: the model cannot invent a new tool name and have it magically exist.
- Validatable input: arguments must pass schema checks before any real action happens.
- Simple security stories: if there is no “deleteAllUsers” tool, the model cannot delete all users.
3.3 Agency as a System Property
[DEMO: A diagram view that shows three boxes—Model, Execution Layer, World—connected in a loop. Toggling checkboxes can disable (a) the model, (b) the execution layer, or (c) the feedback path. The UI highlights which combinations still qualify as an “agent” in the sense of causing effects.] Once you have tools and routing, you need to decide where the agent boundary lies. The model may choose which tool to call, your code may override or veto calls, and humans may approve certain actions; together, these pieces form the agent system. Consider a thermostat that measures temperature and turns a heater on or off: is it an agent? It certainly causes effects based on sensed state. What about a cron job that sends an email report once a day? There is no “intelligence,” but there is a clear causal loop. The boundary matters because it determines how you think about responsibility and control.Agency is a system property, not a model property. The agent is the entire loop: model deciding, code executing, results feeding back. The model is accountable for decisions; your code is accountable for execution; you are accountable for the design that combines them.
- The model makes a decision: which tool to call with which arguments, and how to explain the result.
- The execution layer carries out the action, subject to policies.
- The memory captures the fact that the action occurred and what happened.
- No model: a fixed script that always calls the same tool with the same arguments. Still causal, but not adaptive.
- No execution: a chat bot that promises to do things but never does. Conversational, but not agentic.
- No feedback: a system that acts but never learns from its actions. Powerful, but opaque and brittle.
- Did the model choose a bad action, given its instructions and context?
- Did the execution layer authorize and run something it should have blocked?
- Did you define tools or policies that made the bad outcome possible?
3.4 Controlling What Actions Are Possible
[DEMO: A “tool inspector” UI that shows a list of tools, each labeled as Read, Write, or External. Toggling a tool between categories updates a simulated policy engine: some actions now require confirmation, others are auto-approved or blocked. A panel shows example model tool calls and whether they would be allowed.] The agent is the loop, and the execution layer is where causality lives. The key question becomes less “what can the model do?” and more “what do you allow to happen?” If actions have real consequences—emails that cannot be unsent, payments that cannot be quietly reversed—who is responsible for preventing mistakes? If a model proposes something obviously harmful, should that be caught in the prompt, in the tool schema, or somewhere else? You also face subtler questions. Not every action is equally risky. Reading a calendar is different from deleting it. Posting a draft to an internal channel is different from tweeting to the world. Not every agent needs to loop; a one-shot “send this report once per day” agent has different failure modes than an autonomous crawler. To design sane systems, you need a vocabulary for classifying actions and matching scrutiny to stakes.You should align the amount of review and control you apply to an action with how much damage it could cause. Read operations can usually run automatically. Write operations should be logged and may require soft confirmation. Irreversible external operations should require explicit human approval or be blocked entirely.
sendPayment tool, payments are impossible no matter how eloquently the model asks. A model cannot hallucinate its way past your capability set.
Second, the classification of each action is visible and audited. “This tool is irreversible” is not a buried comment; it is a data field that policy code can inspect. Changing a tool’s effect from write to irreversible is a one-line, reviewable edit that tightens constraints without touching prompts.
Third, the policy is centralized. Instead of sprinkling “are you sure?” checks across code paths, you have a single enforcement function that sits between model intent and execution. This is where you implement approvals, rate limiting, role-based access, and environment differences (e.g., “in staging, external tools are simulated”).
Finally, you can define agents of different shapes against the same tool set:
- A single-action agent that runs once with
effect: 'write'tools but never loops. - A background agent that runs on a schedule, but whose tools are all
reador reversible. - An interactive agent that can propose
irreversibleactions but must wait for UI approval.
3.5 Code Execution as the Same Pattern
[DEMO: A split view where the model’s output is shown as JavaScript source code. A “run in sandbox” button executes it and shows effects (e.g., transformed data). Another toggle shows the same pattern expressed as a JSON tool call instead of raw code.] We have been talking about named tools, but many systems take a more open approach: instead of constraining the model to a menu of functions, they let it write code directly. Subjectively, this feels like the model has crossed some threshold. It is not just choosing from a list; it is authoring arbitrary programs. The same underlying issue still applies: your system decides whether and how to run the generated code. If code is never executed, it causes no effects; if it runs in a sandbox without network or filesystem access, it has less capability than a tool that can send email. Mechanistically, nothing fundamental has changed.Code execution is the same pattern as tool calling. The model outputs text that happens to be code; your runtime decides whether to execute it, under what constraints, and with which capabilities exposed.
generatedCode is just another string. It does not know whether you will run it, where, or with what permissions. Your sandbox decides:
- Whether compilation is allowed at all.
- Which globals and APIs are visible.
- How much CPU time and memory the code can use.
- Whether outbound network or disk access exists.
- Orchestrate multiple API calls inside its own code.
- Implement conditional logic and loops without extra tool-round-trips.
- Perform complex data transformations purely in the sandbox.
- Bugs and mis-specifications become runtime errors deep in generated code.
- Security mistakes in the sandbox configuration can expose far more capability than intended.
- Observability becomes harder: instead of a clean log of tool invocations, you have arbitrary code doing arbitrary things.
3.6 Observability for Causal Systems
When your systems only generate text, mistakes are cheap. A wrong answer can be corrected with another question. A misleading explanation can be clarified. Once your systems cause effects, mistakes persist. An email sent to the wrong address cannot be retrieved from someone’s inbox. A file deleted from disk may not be recoverable. A payment initiated against the wrong account becomes a customer-support ticket. At that point, you need more than a philosophical distinction between intent and effect. You need records: what the model proposed, what the execution layer actually did, under what authorization, and with what outcome. The minimal version is a structured action log that sits exactly at the boundary between model output and execution.- The original user request.
- The model’s proposed action.
- The policy decision that allowed or blocked it.
- The actual effects your tools produced.
- Clarify the tool’s description so the model stops proposing it in inappropriate contexts.
- Tighten or loosen policy thresholds.
- Split a dangerous tool into safer sub-tools with narrower scopes.
Bridge to Chapter 4
This chapter has stayed focused on a single claim: a model that only produces text has no agency by itself. Agency appears when you connect that text to code that can act, under rules you define, against systems you control. We traced that connection through several layers:- Intent vs. effect. The model proposes; your execution layer decides and acts.
- Tool calling. Structured outputs and a router give you a clean boundary between model and world.
- System boundaries. The “agent” is the loop: model, execution, memory, and feedback.
- Effect boundaries. You match scrutiny to stakes by classifying tools and enforcing policy.
- Code execution. Even when the model writes code, your sandbox grants or denies real power.
- Observability. Logs at the intent–effect boundary let you understand and improve causal behavior.