You now have all the pieces to make a single agent look intelligent. You can build rich context, retrieve long-term memory, invoke tools, and structure multi-step reasoning across calls. Within those boundaries, the agent behaves consistently on a single problem and conversation, with a single place where decisions are made.
Problems appear when you try to extend that coherence beyond the capacity of a single context window.
You do not notice the limit when the agent answers a question or reviews a single file. You notice it when you try to make one agent handle everything: research and writing and code review and calendar triage and sales outreach. You keep adding: more tools, longer prompts, more history. The system gets strictly more capable on paper and noticeably less reliable in practice.
Underneath, the same model and API remain; the change is how you partition roles and context around it. What changed is how many different roles you are trying to fit into one context. You have a single attention mechanism asked to juggle research protocols, code style guides, email etiquette, and project management rules, all while paging through a growing stack of prior interactions.
At some point, you introduce multi‑agent structure. You split responsibilities, add specialist components, and let them talk to each other. The system feels qualitatively different: distinct personas, apparent collaboration, specialization. It is tempting to think you have gone from one mind to many.
Mechanically, you still call a single model sequentially; what changed is how you structure its prompts and context. What has multiplied is not minds but contexts: different system prompts, different tool sets, different slices of state, and new code that moves information between them.
These design choices affect how the system behaves:
- If you imagine multiple persistent agents negotiating with each other, you will look for emergent social behavior where there is only context routing.
- If you think multi‑agent systems are a fundamentally new paradigm, you will miss that they reuse software patterns you already know—classification, dispatch, message passing—around an LLM.
- If you underestimate coordination, you will split too early and drown in overhead. If you ignore it, you will cram everything into one context and drown in interference.
The agent is implemented as multiple contexts plus code that routes, delegates, and synchronizes work between them. Coordination is the architecture that orchestrates many isolated calls into a single system.
Multi-agent structure is an engineering implementation concern, not a user-facing concept. Users do not see “Researcher Agent” and “Writer Agent” collaborating. They see one assistant that researches and writes. The internal structure—how many contexts, how they hand off, whether they run in parallel—is invisible. It is a performance and reliability optimization, like choosing between a monolith and microservices.
Why split into multiple agents at all? Two reasons dominate:
-
Context economy. Each agent gets a focused context: a specialized system prompt, a curated tool set, a smaller slice of history. One mega-context with everything leads to interference—instructions ignored, tools misused, phases bleeding into each other. Splitting buys you back focus.
-
Parallelization. Like the actor model in distributed systems, independent agents can work simultaneously on different subtasks. Research five competitors in parallel, then synthesize. The total latency is the longest single task, not the sum. When subtasks are independent, parallelism is free performance.
These are the same reasons you split any system into components: isolation and concurrency. Multi-agent structure is not a new paradigm. It is the old patterns—microservices, actor model, pipeline parallelism—applied to LLM calls.
This chapter follows that architecture through four themes:
- Multi‑agent systems in practice.
- Routing work to the right components.
- State sharing between components.
- Evaluating coordination overhead.
By the end, multi‑agent systems will look less like a mysterious new category and more like what they are: multiple system prompts with routing, handoffs, and shared artifacts.
5.1 Multi‑Agent Systems in Practice
[DEMO: A panel shows two “agents” chatting: a Researcher and a Writer, each with a name, avatar, and message stream. On the side, a trace view reveals what actually happens: call with Researcher prompt → output appended to shared log → call with Writer prompt → output appended, and so on. Toggling a switch hides and shows the underlying call sequence, making it clear that “agents” are just different prompts applied over the same shared transcript.]
From the outside, a multi‑agent system looks like a small organization. You see named entities—“Researcher,” “Writer,” “Planner,” “Critic”—sending messages back and forth. One proposes a plan, another critiques it, a third revises. It feels like collaboration between distinct minds, each with its own knowledge and preferences.
This leads to several concrete questions about how such a system is implemented.
If there is only one underlying model, where do these distinct agents live? In practice, you do not create multiple minds; you define separate prompts and contexts. Each call is stateless, so an agent’s “role” is reconstructed from its system prompt and the current shared transcript. When two agents talk, they exchange text via your routing code rather than holding an independent conversation.
If the model is general‑purpose, you could keep a single generalist agent and avoid multiple personas. However, splitting roles lets you tighten prompts, tooling, and context for each task. Making a Researcher and a Writer exchange results through explicit handoffs produces clearer boundaries and more reliable behavior than asking one agent to handle both research and writing in a single context.
The answers to these questions are mechanical: they describe how prompts, contexts, and call patterns are implemented.
Multi‑agent systems are multiple system prompts with routing logic. Each “agent” is reconstructed on every call from its system prompt and current context. What looks like agents conversing is: call with prompt A → append output to shared context → call with prompt B → append output, and so on. The value is specialization—focused prompts, tools, and context—not independent entities.
You can see this clearly if you implement the simplest possible “two‑agent” exchange.
type Role = 'researcher' | 'writer';
const SYSTEM_PROMPTS: Record<Role, string> = {
researcher: `
You are a research specialist. Your job is to gather factual information
and present concise findings with sources. Do not write narrative prose;
only return bullet-point findings.`,
writer: `
You are a writing specialist. Your job is to take research findings and
turn them into clear, well-structured prose for a non-expert reader.
Do not perform new research; rely only on the provided findings.`
};
interface Message {
role: 'user' | 'assistant';
author: Role;
content: string;
}
async function callAgent(role: Role, sharedLog: Message[], userInput: string) {
const system = SYSTEM_PROMPTS[role];
const conversation = [
{ role: 'system', content: system },
// Flatten shared log into a single transcript string
{ role: 'user', content: renderSharedLog(sharedLog, role, userInput) }
];
const reply = await llm.complete(conversation);
const message: Message = {
role: 'assistant',
author: role,
content: reply
};
return message;
}
Here there is no persistent “Researcher object” holding its own memory. There is only:
- A fixed system prompt describing the role.
- A shared log of prior messages.
- A function
callAgent(role, sharedLog, userInput) that reconstructs the agent on demand.
Each time you call the “researcher,” you feed the same system prompt and the latest transcript into the model. Each time you call the “writer,” you do the same with a different system prompt. The illusion of stable personas arises from the stability of those prompts plus the shared log you maintain.
The “conversation” between agents is just a pattern of calls:
- Append the user’s initial task to the shared log.
- Call the researcher with its system prompt and the log; append its reply.
- Call the writer with its system prompt and the updated log; append its reply.
- Optionally loop, alternating roles based on some rule.
Nothing about this requires multiple models or long‑lived entities. What you have is a routing pattern over a single model.
The following points describe the resulting behavior:
- Agents do not own memory. The shared log (and any external storage) does.
- Agents do not run autonomously. Your code decides when to invoke which system prompt.
- Agents do not negotiate in some hidden channel. They only influence each other through text you explicitly pass along.
Specialization buys you several concrete advantages:
- Focused instructions. A researcher prompt can devote all of its tokens to research norms and tools. A writer prompt can devote all of its tokens to tone, structure, and audience. They do not have to share.
- Focused tools. You can expose search and retrieval tools only to the researcher, and editing tools only to the writer. Each call considers a small tool set instead of every tool you have ever defined.
- Focused history. You can decide what parts of the shared log each role sees. The writer does not need to see every search query; the researcher does not need to see every draft revision.
In other words, multi‑agent structure is best understood as multi‑context. You trade the simplicity of one giant context for multiple smaller ones, plus coordination code that moves information between them. The personas are just labeling on an architectural decision: split the problem by role and tailor context to each role.
The implications are practical:
- You do not get emergent organizational behavior just by adding avatars. If you want specific patterns—debate, critique, refinement—you encode them in routing and call order.
- You debug multi‑agent systems by looking at call graphs and logs, not by psychoanalyzing agents. The system behaves the way your orchestration makes it behave.
- You can often replace an “agent talking to agent” chat with a simpler pipeline: researcher call → writer call. The chat UI is optional; the underlying mechanism is function composition.
When you design coordination, think in these terms: which distinct contexts do I want, and how do I move information between them? The “agents” are just the names you give those contexts.
5.2 Routing Work to the Right Components
[DEMO: A single input box where you type tasks like “find sources on battery tech,” “draft an email to a VP,” or “optimize this SQL query.” The UI shows which component handled the request (Researcher, Writer, Coder, General) and the routing decision (keyword match vs. model‑based classification). A toggle lets you switch between naive keyword routing and LLM‑based classification, so you can see misroutes and corrections.]
Once you have multiple specialized components, a new question appears: who decides who does what?
If the user asks for “a quick script to analyze this CSV,” you need to decide whether it belongs to research, writing, or coding. Similarly, a contract risk question might go to a legal‑style analyzer, a general answerer, or a summarizer. Tasks that span multiple skills—like “research three competitors and draft a concise pitch deck”—require you to define how to break them apart into routed subtasks.
At some layer, something has to answer:
- Given this incoming request, which component should handle it first?
- If the first component only solves part of the problem, who gets the next turn?
- When specializations overlap, how do you avoid both duplication (two components doing the same work) and gaps (every component assuming someone else will handle a part)?
You can imagine a high‑level “manager agent” deciding this dynamically: reading the task, understanding each component’s strengths, and delegating like a human project manager. That picture is again more mystical than mechanical.
Mechanically, routing reduces to two operations you already know:
Routing is classification plus dispatch. You classify the input (“what kind of task is this?”) and then dispatch it to the handler for that class. Specialization trades generality for depth: each component has focused prompts, tools, and context. The cost is coordination overhead; the benefit is higher quality within each specialty.
Here is a minimal entry‑point router that uses the model itself for classification.
type Mode = 'general' | 'research' | 'writing' | 'coding';
interface Component {
handle(input: string): Promise<string>;
}
const components: Record<Mode, Component> = {
general: new GeneralAssistant(),
research: new Researcher(),
writing: new Writer(),
coding: new Coder()
};
async function classify(input: string): Promise<Mode> {
const { text } = await llm.complete({
system: `
You are a router. Classify the user's request as one of:
- research: wants information gathered or verified
- writing: wants content drafted or edited
- coding: wants code written, explained, or modified
- general: wants a direct answer or conversation
Respond with only the category name.`,
user: input
});
const label = text.trim().toLowerCase();
if (label === 'research' || label === 'writing' || label === 'coding') {
return label;
}
return 'general';
}
export async function handleUserRequest(input: string): Promise<string> {
const mode = await classify(input); // classification
const component = components[mode]; // dispatch
return component.handle(input);
}
The router is not an agent with a personality. It is just a function:
- Ask the model to classify an input into one of a fixed set of modes.
- Use the classification label to dispatch to a component.
You can replace the model with regexes or a decision tree; the pattern stays the same.
Why is this worth doing instead of sending everything to a single generalist?
Because specialization lets you tighten each component’s design:
- The research component can assume that its input always requires external information. It can aggressively call search tools and focus on sources, citations, and coverage.
- The writing component can assume it is turning structured input (like research findings) into prose. It can ignore tool calls entirely and optimize for clarity and tone.
- The coding component can assume it sees code or specifications and can load language‑specific tools and style guides.
Each component’s prompt gets shorter and more focused. Each component’s tool set gets smaller and more relevant. Each component’s test surface becomes narrower. In practice, this often yields large quality gains on real workloads.
The cost is routing overhead:
- Latency: you add at least one extra model call to classify before doing the work.
- Failure modes: misclassification sends a task to the wrong component, which can be worse than a mediocre answer from a generalist.
- Complexity: you now have more pieces to reason about (router plus components) instead of one.
Routing itself often needs evaluation and iteration. You might:
- Start with a simple model‑based router like the one above.
- Log input, chosen mode, and eventual success or failure.
- Train a lightweight classifier on those logs once you see enough data.
- Fall back to the model for ambiguous cases, or ask the model to choose between only two likely options.
Routing also happens inside the system, not just at the entry point. A coordinator may need to decide whether an intermediate result needs more research, more writing, or human review. The mechanism is the same: classify the situation, then dispatch to the appropriate handler.
One subtlety: not all routing should be decided by an LLM. Some paths are structural:
- “If the research handoff has zero high‑confidence sources, route to human review.”
- “If the writing stage produced more than N tokens, route to compression instead of review.”
- “If the user explicitly clicked ‘Run Code’, route directly to the executor.”
For those, hard‑coded rules are clearer, faster, and safer.
When you add new specialized components, you must also design how tasks and intermediate states reach them. That design is not mystical orchestration. It is explicit classification and dispatch, tuned to your application’s error tolerance and latency budget.
5.3 State Sharing Between Components
[DEMO: Three components—Planner, Worker, Reviewer—operate on a shared “Task Board” displayed in the UI. The Planner adds tasks, the Worker marks some as done and adds notes, the Reviewer flags issues. A timeline view shows what each component sees at each step: the exact subset of board state included in its context on each call. Toggling filters reveals what happens if you send too little (information loss) or too much (clutter) during handoffs.]
Once work reaches a component, that component runs in its own context: its system prompt, its tools, its slice of history. But tasks rarely end after a single step. Results flow to other components. Work pauses and resumes. Humans dip in and out. Somewhere, there is ongoing state that needs to be shared.
This raises concrete questions.
When the Researcher finishes and the Writer begins, you must decide what moves between them: the whole conversation, a structured summary, or selected raw documents. If you add a Reviewer later, you also choose whether it sees the same inputs as the Researcher, the Writer, or a separate projection tailored to review.
If two components work on the same artifact—a document, a plan, a task board—they need a way to avoid overwriting each other’s changes. You represent the current state of the world in a shared artifact that components read and update through controlled operations.
If components are just reconstructed contexts around a stateless model, persistent state must be stored outside the model itself.
The mechanism is not mysterious, but you have to be explicit:
Handoffs are context transfers. The receiving component needs enough context—task description, relevant history, intermediate results—to continue. Shared artifacts (documents, boards, plans) provide a coordination surface: components read current state, modify it, and others see those changes. The artifact, not the agent, is the shared data structure components read and write.
There are two basic patterns:
- Direct handoffs: serialize the information one component produced into a structure and feed it to the next component as input.
- Shared artifacts: maintain a persistent object that all components read and write, and reconstruct context from that object on each call.
A typed handoff makes the transfer explicit.
interface ResearchFinding {
content: string;
source: string;
confidence: 'high' | 'medium' | 'low';
}
interface ResearchHandoff {
topic: string;
findings: ResearchFinding[];
gaps: string[];
summary: string;
}
async function researcher(topic: string): Promise<ResearchHandoff> {
// ...call LLM + tools, then compress into this structure...
}
async function writer(handshake: ResearchHandoff): Promise<string> {
const prompt = `
Write a clear article on "${handshake.topic}".
Base your writing only on the findings below.
Highlight any gaps explicitly.
Findings:
${handshake.findings.map(f => `- (${f.confidence}) ${f.content} [${f.source}]`).join('\n')}
Gaps:
${handshake.gaps.map(g => `- ${g}`).join('\n')}
`;
const { text } = await llm.complete({ system: WRITER_SYSTEM_PROMPT, user: prompt });
return text;
}
Here, the “handoff” is not whatever text happened to be in the researcher’s context. It is a structured object whose shape is designed for the writer’s needs. The writer sees:
- The topic.
- A curated list of findings with sources and confidence.
- Explicitly named gaps.
It does not see:
- Failed search queries.
- Raw snippets that were filtered out.
- The researcher’s own chain‑of‑thought.
You compress the researcher’s internal state into a form that is high signal for the writer. That compression is part of your coordination code.
Direct handoffs work well for simple linear pipelines. For richer systems, you usually want an explicit shared artifact.
interface Task {
id: string;
title: string;
status: 'todo' | 'in_progress' | 'done' | 'needs_review';
notes: string[];
}
class TaskBoard {
@field tasks: Task[] = [];
@action()
addTask(title: string) {
this.tasks.push({
id: crypto.randomUUID(),
title,
status: 'todo',
notes: []
});
}
@action()
updateTask(id: string, patch: Partial<Task>) {
const task = this.tasks.find(t => t.id === id);
if (!task) return;
Object.assign(task, patch);
}
}
Each component then reconstructs its own context from this artifact.
async function plannerStep(board: TaskBoard) {
const pending = board.tasks.filter(t => t.status === 'todo');
const prompt = `
You are a planning assistant. Here are the current tasks:
${pending.map(t => `- [${t.id}] ${t.title}`).join('\n')}
Propose any additional tasks that are missing.
`;
// ...use LLM to propose new tasks, then board.addTask(...)
}
async function workerStep(board: TaskBoard) {
const tasks = board.tasks.filter(t => t.status === 'todo');
const prompt = `
You are an execution assistant. For each task below, either:
- mark it as in_progress or done, and add a brief note
- or mark it as needs_review if it requires human input.
Tasks:
${tasks.map(t => `- [${t.id}] ${t.title}`).join('\n')}
`;
// ...use LLM to decide, then board.updateTask(...)
}
The LLM never owns the board. It is given a projection of the board (only todo tasks, only tasks needing review, etc.), generates suggestions, and your code applies those suggestions by mutating the shared artifact.
This design gives you several advantages:
- You can choose how much of the artifact each component sees. A Reviewer can see all tasks, while a Worker sees only assigned tasks.
- You can enforce invariants in code (for example, a task cannot move from
todo directly to done without a note).
- You can log and diff changes over time, making debugging and audits tractable.
The same pattern applies to more complex artifacts: documents with sections, codebases with files, plans with dependency graphs. The artifact is the central representation. Components are pure functions from “artifact slice + task description” to “proposed changes.”
When coordination fails, it is often because handoffs and artifacts are not carefully designed:
- Information loss: a researcher’s nuanced distinctions vanish in a one‑sentence summary; the writer’s work is constrained by under‑specified input.
- Information overload: a reviewer gets flooded with raw logs and cannot see the few fields that matter.
- State inconsistency: two components operate on different versions of the artifact, and their updates conflict.
Thinking in terms of artifacts and typed handoffs forces you to answer: who needs what, in what form, at what time? Once you design those flows, implementing them is straightforward.
5.4 Evaluating Coordination Overhead
[DEMO: A toggle lets you run the same composite task (“research three competitors and draft a summary email”) in two modes. Mode A: a single, large prompt to one generalist agent with all tools enabled. Mode B: a coordinated pipeline (Router → Researcher → Writer → Reviewer) with typed handoffs. The UI shows latency, token usage, and qualitative output side‑by‑side, plus a log of failures when you push the system to larger tasks.]
Splitting a system into multiple components and wiring up routing and handoffs is work. Every new component doubles as a new place to go wrong. Every boundary introduces the possibility of information loss or duplication. Latency increases with each hop. Debugging now spans call graphs and artifacts instead of a single prompt.
This raises two concerns: whether coordination is worth the effort at all, and under what conditions splitting components makes sense.
Common patterns include breaking tasks into planner, worker, and reviewer roles or using many small agents, but these should not be applied indiscriminately. That kind of rule ignores a hard constraint you saw in earlier chapters: the context window and the attention mechanism are finite.
As you increase task complexity, three failure modes appear in single‑component systems:
- Instructions are ignored or applied inconsistently.
- Tools are chosen incorrectly or not used when they should be.
- Multi‑step processes get muddled; earlier phases leak into later ones.
You can patch these with better prompts up to a point, but they stem from a structural problem: one context is being asked to hold too much. Coordination is how you break that unit apart. It is not free; it is a trade.
Coordination is overhead. A single component is sufficient when context fits, tools do not conflict, and prompting does not need to split. You add components when specialization benefits exceed coordination costs. Human involvement is just another “component”: the system pauses, surfaces state, and waits for external input like any other dependency.
You can see the tradeoff in a simple orchestrator.
class Orchestrator {
constructor(
private router: (input: string) => Promise<Mode>,
private research: (topic: string) => Promise<ResearchHandoff>,
private write: (handoff: ResearchHandoff) => Promise<string>,
private review: (draft: string) => Promise<{ approved: boolean; notes: string; }>
) {}
async handle(task: string): Promise<string> {
const mode = await this.router(task);
if (mode === 'general') {
// Single-call path: no coordination needed
const { text } = await llm.complete({
system: GENERAL_SYSTEM_PROMPT,
user: task
});
return text;
}
// Coordinated path: research -> write -> review
const researchHandoff = await this.research(task);
const draft = await this.write(researchHandoff);
const verdict = await this.review(draft);
if (!verdict.approved) {
// Simple feedback loop: revise once
const revisedPrompt = `
Here is feedback on your draft:
${verdict.notes}
Revise the draft accordingly.
Draft:
${draft}
`;
const { text: revised } = await llm.complete({
system: WRITER_SYSTEM_PROMPT,
user: revisedPrompt
});
return revised;
}
return draft;
}
}
This orchestrator explicitly chooses between “just call the generalist” and “run the full pipeline.” That choice encodes your judgment about when coordination is worth it.
As you design systems, several signals tell you that it is time to split:
- Context collisions: you keep adding instructions to the prompt and see them being ignored in specific regimes (“it forgets the citation format whenever code examples appear”).
- Tool confusion: with many tools available, the model starts calling the wrong one or hallucinating tool usage entirely.
- Phase interference: earlier reasoning bleeds into later steps, or the model skips necessary stages (“it goes straight to writing without doing any research when the input is longer than N tokens”).
When you see these, you can often fix them by:
- Creating a specialist role with a narrower prompt and smaller tool set.
- Moving part of the process into a separate call that produces a structured artifact.
- Introducing a minimal coordinator that sequences those parts and enforces handoffs.
There are also clear cases where coordination is not just useful but required:
- Long‑running tasks: anything that must pause for hours while waiting for data, human approval, or external events.
- Human‑in‑the‑loop decisions: actions that require explicit approval or judgment.
- Parallel work: tasks that naturally split into independent subproblems.
Parallelization deserves special attention because it is pure performance gain. When subtasks are independent, you can run them concurrently—like the actor model in distributed systems.
async function parallelResearch(topics: string[]): Promise<ResearchResult[]> {
// Each topic gets its own agent call, running in parallel
const results = await Promise.all(
topics.map(topic =>
llm.complete({
system: RESEARCHER_PROMPT,
user: `Research: ${topic}`
})
)
);
return results.map((r, i) => ({ topic: topics[i], findings: r.text }));
}
async function researchAndSynthesize(topics: string[]) {
// Parallel phase: N topics researched simultaneously
const findings = await parallelResearch(topics);
// Sequential phase: synthesize all findings into one report
const synthesis = await llm.complete({
system: SYNTHESIZER_PROMPT,
user: `Synthesize these findings:\n${JSON.stringify(findings, null, 2)}`
});
return synthesis.text;
}
With five topics, sequential research takes 5× the latency of one. Parallel research takes 1×. The synthesis step still needs all results, so it waits—but the fan-out phase is free concurrency. This is the actor model applied to LLM calls: independent workers, joined by a coordinator.
A human approval gate is a good example of coordination treating a human as a component.
class ControlledAgent {
@field pendingAction: {
type: string;
payload: any;
context: string;
} | null = null;
async proposeAction(type: string, payload: any, context: string) {
// Called from LLM-driven logic when an action is needed
this.pendingAction = { type, payload, context };
// System stops here and waits for human input via UI
}
@action()
async approve() {
if (!this.pendingAction) return;
await this.execute(this.pendingAction.type, this.pendingAction.payload);
this.pendingAction = null;
}
@action()
async reject(reason: string) {
if (!this.pendingAction) return;
this.log('Action rejected', { ...this.pendingAction, reason });
this.pendingAction = null;
}
private async execute(type: string, payload: any) {
// Implementation of side effects: send email, update CRM, etc.
}
}
From the system’s perspective, the human is just another async dependency:
- Set
pendingAction to describe what you want to do and why.
- Wait until someone clicks approve or reject.
- Resume execution.
The overhead is obvious: more state to maintain, more UI, more latency. The benefit is equally obvious when stakes are high.
A conservative approach is:
- Start with a single capable component.
- Instrument it: log failures, context sizes, tool choices.
- When you see specific overload patterns, split along those fault lines and introduce the minimal coordination needed.
- Treat every new component as a cost that must pay for itself in clarity, quality, or safety.
Coordination only makes sense when it addresses concrete limits of a single context.
5.5 Implementing Coordination Patterns in Idyllic
Idyllic gives you natural hooks for coordination. Systems are components. Fields hold shared state and artifacts. Actions define the delegation and routing surface. The LLM remains a function you call; Idyllic handles the plumbing for persistence and real‑time updates.
You can express these patterns with a coordinator system that delegates to specialist systems:
class Coordinator extends IdyllicSystem {
@field status: 'idle' | 'routing' | 'researching' | 'writing' | 'reviewing' = 'idle';
@field currentTask: string = '';
@field finalOutput: string = '';
private researcher = new Researcher();
private writer = new Writer();
private reviewer = new Reviewer();
@action()
async handle(task: string): Promise<void> {
this.currentTask = task;
this.status = 'routing';
this.sync();
const mode = await classify(task);
if (mode === 'general') {
const { text } = await llm.complete({
system: GENERAL_SYSTEM_PROMPT,
user: task
});
this.finalOutput = text;
this.status = 'idle';
return;
}
// Coordinated path
this.status = 'researching';
this.sync();
const researchHandoff = await this.researcher.research(task);
this.status = 'writing';
this.sync();
const draft = await this.writer.write(researchHandoff);
this.status = 'reviewing';
this.sync();
const verdict = await this.reviewer.review(draft);
if (!verdict.approved) {
const revised = await this.writer.revise(draft, verdict);
this.finalOutput = revised;
} else {
this.finalOutput = draft;
}
this.status = 'idle';
}
}
Here:
Coordinator is the hub, maintaining high‑level task state (status, currentTask, finalOutput).
Researcher, Writer, and Reviewer are spokes with narrow responsibilities and prompts.
- Handoffs are typed:
research returns a structured object, write takes that object, review takes a string and returns a verdict.
Because status and currentTask are fields, connected clients can see the coordination process: reviewing vs. writing vs. researching. This is not just nice UI; it also gives you observability into where time is spent and where failures occur.
Routing at the front door is just another system:
class FrontDoor extends IdyllicSystem {
@field lastMode: Mode = 'general';
private coordinator = new Coordinator();
@action()
async ask(input: string): Promise<void> {
const mode = await classify(input);
this.lastMode = mode;
this.sync();
await this.coordinator.handle(input);
}
}
Again, the router is not an agent. It is a thin method that calls the model for classification and updates a field for visibility.
Shared artifacts are just fields that multiple systems read and write. For example, a shared “plan” that various components coordinate around fits naturally as a field on a system that everyone references. Each component reconstructs its prompt from that field, and changes broadcast automatically to all connected clients.
Human approval is the same pattern you saw above, expressed as state plus actions:
- A field that holds the pending request.
- An action the UI calls when a human approves or rejects.
- Coordination code that pauses until one of those actions fires.
The underlying principle is consistent: components are code; coordination is state plus function calls. Idyllic gives you convenient primitives for both; the architectural decisions remain yours.
Key Takeaways
- Multi‑agent systems are not multiple minds; they are multiple system prompts plus routing applied to the same underlying model. Each “agent” is reconstructed on every call.
- Routing is classification plus dispatch. It directs tasks to specialized components, trading extra complexity and latency for better performance within each specialization.
- Components do not share minds; they share state. Handoffs transfer compressed, structured context between calls, and shared artifacts provide a surface that multiple components can read and modify.
- Coordination is overhead you pay when a single context can no longer carry instructions, tools, and history reliably. You add components when specialization benefits outweigh coordination costs.
- Humans fit into the same patterns as any other component: the system surfaces state, pauses, and resumes based on their input.
Bridge to Chapter 6
This chapter focused on how components interact: routing work to specialists, delegating steps, transferring context through handoffs, and coordinating around shared state. The intelligence you perceive in multi‑agent systems comes from this architecture, not from models spontaneously forming organizations.
What are these components coordinating around? In most useful systems, the answer is an artifact: a document being written, a codebase being modified, a plan being refined, a board of tasks being managed. These artifacts are not just blobs of text. They carry structure and semantics that shape what each component does.
Chapter 6 turns to artifacts: the shared objects that multiple components create, inspect, and transform, and how designing those artifacts well simplifies the rest of your coordination.