You now have all the pieces to make a single agent look intelligent. You can build rich context, retrieve long-term memory, invoke tools, and structure multi-step reasoning across calls. Within those boundaries, the agent behaves consistently on a single problem and conversation, with a single place where decisions are made. Problems appear when you try to extend that coherence beyond the capacity of a single context window. You do not notice the limit when the agent answers a question or reviews a single file. You notice it when you try to make one agent handle everything: research and writing and code review and calendar triage and sales outreach. You keep adding: more tools, longer prompts, more history. The system gets strictly more capable on paper and noticeably less reliable in practice. Underneath, the same model and API remain; the change is how you partition roles and context around it. What changed is how many different roles you are trying to fit into one context. You have a single attention mechanism asked to juggle research protocols, code style guides, email etiquette, and project management rules, all while paging through a growing stack of prior interactions. At some point, you introduce multi‑agent structure. You split responsibilities, add specialist components, and let them talk to each other. The system feels qualitatively different: distinct personas, apparent collaboration, specialization. It is tempting to think you have gone from one mind to many. Mechanically, you still call a single model sequentially; what changed is how you structure its prompts and context. What has multiplied is not minds but contexts: different system prompts, different tool sets, different slices of state, and new code that moves information between them. These design choices affect how the system behaves:Documentation Index
Fetch the complete documentation index at: https://docs.idyllic.so/llms.txt
Use this file to discover all available pages before exploring further.
- If you imagine multiple persistent agents negotiating with each other, you will look for emergent social behavior where there is only context routing.
- If you think multi‑agent systems are a fundamentally new paradigm, you will miss that they reuse software patterns you already know—classification, dispatch, message passing—around an LLM.
- If you underestimate coordination, you will split too early and drown in overhead. If you ignore it, you will cram everything into one context and drown in interference.
- Context economy. Each agent gets a focused context: a specialized system prompt, a curated tool set, a smaller slice of history. One mega-context with everything leads to interference—instructions ignored, tools misused, phases bleeding into each other. Splitting buys you back focus.
- Parallelization. Like the actor model in distributed systems, independent agents can work simultaneously on different subtasks. Research five competitors in parallel, then synthesize. The total latency is the longest single task, not the sum. When subtasks are independent, parallelism is free performance.
- Multi‑agent systems in practice.
- Routing work to the right components.
- State sharing between components.
- Evaluating coordination overhead.
5.1 Multi‑Agent Systems in Practice
[DEMO: A panel shows two “agents” chatting: a Researcher and a Writer, each with a name, avatar, and message stream. On the side, a trace view reveals what actually happens: call with Researcher prompt → output appended to shared log → call with Writer prompt → output appended, and so on. Toggling a switch hides and shows the underlying call sequence, making it clear that “agents” are just different prompts applied over the same shared transcript.] From the outside, a multi‑agent system looks like a small organization. You see named entities—“Researcher,” “Writer,” “Planner,” “Critic”—sending messages back and forth. One proposes a plan, another critiques it, a third revises. It feels like collaboration between distinct minds, each with its own knowledge and preferences. This leads to several concrete questions about how such a system is implemented. If there is only one underlying model, where do these distinct agents live? In practice, you do not create multiple minds; you define separate prompts and contexts. Each call is stateless, so an agent’s “role” is reconstructed from its system prompt and the current shared transcript. When two agents talk, they exchange text via your routing code rather than holding an independent conversation. If the model is general‑purpose, you could keep a single generalist agent and avoid multiple personas. However, splitting roles lets you tighten prompts, tooling, and context for each task. Making a Researcher and a Writer exchange results through explicit handoffs produces clearer boundaries and more reliable behavior than asking one agent to handle both research and writing in a single context. The answers to these questions are mechanical: they describe how prompts, contexts, and call patterns are implemented.Multi‑agent systems are multiple system prompts with routing logic. Each “agent” is reconstructed on every call from its system prompt and current context. What looks like agents conversing is: call with prompt A → append output to shared context → call with prompt B → append output, and so on. The value is specialization—focused prompts, tools, and context—not independent entities.
- A fixed system prompt describing the role.
- A shared log of prior messages.
- A function
callAgent(role, sharedLog, userInput)that reconstructs the agent on demand.
- Append the user’s initial task to the shared log.
- Call the researcher with its system prompt and the log; append its reply.
- Call the writer with its system prompt and the updated log; append its reply.
- Optionally loop, alternating roles based on some rule.
- Agents do not own memory. The shared log (and any external storage) does.
- Agents do not run autonomously. Your code decides when to invoke which system prompt.
- Agents do not negotiate in some hidden channel. They only influence each other through text you explicitly pass along.
- Focused instructions. A researcher prompt can devote all of its tokens to research norms and tools. A writer prompt can devote all of its tokens to tone, structure, and audience. They do not have to share.
- Focused tools. You can expose search and retrieval tools only to the researcher, and editing tools only to the writer. Each call considers a small tool set instead of every tool you have ever defined.
- Focused history. You can decide what parts of the shared log each role sees. The writer does not need to see every search query; the researcher does not need to see every draft revision.
- You do not get emergent organizational behavior just by adding avatars. If you want specific patterns—debate, critique, refinement—you encode them in routing and call order.
- You debug multi‑agent systems by looking at call graphs and logs, not by psychoanalyzing agents. The system behaves the way your orchestration makes it behave.
- You can often replace an “agent talking to agent” chat with a simpler pipeline: researcher call → writer call. The chat UI is optional; the underlying mechanism is function composition.
5.2 Routing Work to the Right Components
[DEMO: A single input box where you type tasks like “find sources on battery tech,” “draft an email to a VP,” or “optimize this SQL query.” The UI shows which component handled the request (Researcher, Writer, Coder, General) and the routing decision (keyword match vs. model‑based classification). A toggle lets you switch between naive keyword routing and LLM‑based classification, so you can see misroutes and corrections.] Once you have multiple specialized components, a new question appears: who decides who does what? If the user asks for “a quick script to analyze this CSV,” you need to decide whether it belongs to research, writing, or coding. Similarly, a contract risk question might go to a legal‑style analyzer, a general answerer, or a summarizer. Tasks that span multiple skills—like “research three competitors and draft a concise pitch deck”—require you to define how to break them apart into routed subtasks. At some layer, something has to answer:- Given this incoming request, which component should handle it first?
- If the first component only solves part of the problem, who gets the next turn?
- When specializations overlap, how do you avoid both duplication (two components doing the same work) and gaps (every component assuming someone else will handle a part)?
Routing is classification plus dispatch. You classify the input (“what kind of task is this?”) and then dispatch it to the handler for that class. Specialization trades generality for depth: each component has focused prompts, tools, and context. The cost is coordination overhead; the benefit is higher quality within each specialty.
- Ask the model to classify an input into one of a fixed set of modes.
- Use the classification label to dispatch to a component.
- The research component can assume that its input always requires external information. It can aggressively call search tools and focus on sources, citations, and coverage.
- The writing component can assume it is turning structured input (like research findings) into prose. It can ignore tool calls entirely and optimize for clarity and tone.
- The coding component can assume it sees code or specifications and can load language‑specific tools and style guides.
- Latency: you add at least one extra model call to classify before doing the work.
- Failure modes: misclassification sends a task to the wrong component, which can be worse than a mediocre answer from a generalist.
- Complexity: you now have more pieces to reason about (router plus components) instead of one.
- Start with a simple model‑based router like the one above.
- Log input, chosen mode, and eventual success or failure.
- Train a lightweight classifier on those logs once you see enough data.
- Fall back to the model for ambiguous cases, or ask the model to choose between only two likely options.
- “If the research handoff has zero high‑confidence sources, route to human review.”
- “If the writing stage produced more than N tokens, route to compression instead of review.”
- “If the user explicitly clicked ‘Run Code’, route directly to the executor.”
5.3 State Sharing Between Components
[DEMO: Three components—Planner, Worker, Reviewer—operate on a shared “Task Board” displayed in the UI. The Planner adds tasks, the Worker marks some as done and adds notes, the Reviewer flags issues. A timeline view shows what each component sees at each step: the exact subset of board state included in its context on each call. Toggling filters reveals what happens if you send too little (information loss) or too much (clutter) during handoffs.] Once work reaches a component, that component runs in its own context: its system prompt, its tools, its slice of history. But tasks rarely end after a single step. Results flow to other components. Work pauses and resumes. Humans dip in and out. Somewhere, there is ongoing state that needs to be shared. This raises concrete questions. When the Researcher finishes and the Writer begins, you must decide what moves between them: the whole conversation, a structured summary, or selected raw documents. If you add a Reviewer later, you also choose whether it sees the same inputs as the Researcher, the Writer, or a separate projection tailored to review. If two components work on the same artifact—a document, a plan, a task board—they need a way to avoid overwriting each other’s changes. You represent the current state of the world in a shared artifact that components read and update through controlled operations. If components are just reconstructed contexts around a stateless model, persistent state must be stored outside the model itself. The mechanism is not mysterious, but you have to be explicit:Handoffs are context transfers. The receiving component needs enough context—task description, relevant history, intermediate results—to continue. Shared artifacts (documents, boards, plans) provide a coordination surface: components read current state, modify it, and others see those changes. The artifact, not the agent, is the shared data structure components read and write.
- Direct handoffs: serialize the information one component produced into a structure and feed it to the next component as input.
- Shared artifacts: maintain a persistent object that all components read and write, and reconstruct context from that object on each call.
- The topic.
- A curated list of findings with sources and confidence.
- Explicitly named gaps.
- Failed search queries.
- Raw snippets that were filtered out.
- The researcher’s own chain‑of‑thought.
- You can choose how much of the artifact each component sees. A Reviewer can see all tasks, while a Worker sees only assigned tasks.
- You can enforce invariants in code (for example, a task cannot move from
tododirectly todonewithout a note). - You can log and diff changes over time, making debugging and audits tractable.
- Information loss: a researcher’s nuanced distinctions vanish in a one‑sentence summary; the writer’s work is constrained by under‑specified input.
- Information overload: a reviewer gets flooded with raw logs and cannot see the few fields that matter.
- State inconsistency: two components operate on different versions of the artifact, and their updates conflict.
5.4 Evaluating Coordination Overhead
[DEMO: A toggle lets you run the same composite task (“research three competitors and draft a summary email”) in two modes. Mode A: a single, large prompt to one generalist agent with all tools enabled. Mode B: a coordinated pipeline (Router → Researcher → Writer → Reviewer) with typed handoffs. The UI shows latency, token usage, and qualitative output side‑by‑side, plus a log of failures when you push the system to larger tasks.] Splitting a system into multiple components and wiring up routing and handoffs is work. Every new component doubles as a new place to go wrong. Every boundary introduces the possibility of information loss or duplication. Latency increases with each hop. Debugging now spans call graphs and artifacts instead of a single prompt. This raises two concerns: whether coordination is worth the effort at all, and under what conditions splitting components makes sense. Common patterns include breaking tasks into planner, worker, and reviewer roles or using many small agents, but these should not be applied indiscriminately. That kind of rule ignores a hard constraint you saw in earlier chapters: the context window and the attention mechanism are finite. As you increase task complexity, three failure modes appear in single‑component systems:- Instructions are ignored or applied inconsistently.
- Tools are chosen incorrectly or not used when they should be.
- Multi‑step processes get muddled; earlier phases leak into later ones.
Coordination is overhead. A single component is sufficient when context fits, tools do not conflict, and prompting does not need to split. You add components when specialization benefits exceed coordination costs. Human involvement is just another “component”: the system pauses, surfaces state, and waits for external input like any other dependency.
- Context collisions: you keep adding instructions to the prompt and see them being ignored in specific regimes (“it forgets the citation format whenever code examples appear”).
- Tool confusion: with many tools available, the model starts calling the wrong one or hallucinating tool usage entirely.
- Phase interference: earlier reasoning bleeds into later steps, or the model skips necessary stages (“it goes straight to writing without doing any research when the input is longer than N tokens”).
- Creating a specialist role with a narrower prompt and smaller tool set.
- Moving part of the process into a separate call that produces a structured artifact.
- Introducing a minimal coordinator that sequences those parts and enforces handoffs.
- Long‑running tasks: anything that must pause for hours while waiting for data, human approval, or external events.
- Human‑in‑the‑loop decisions: actions that require explicit approval or judgment.
- Parallel work: tasks that naturally split into independent subproblems.
- Set
pendingActionto describe what you want to do and why. - Wait until someone clicks approve or reject.
- Resume execution.
- Start with a single capable component.
- Instrument it: log failures, context sizes, tool choices.
- When you see specific overload patterns, split along those fault lines and introduce the minimal coordination needed.
- Treat every new component as a cost that must pay for itself in clarity, quality, or safety.
5.5 Implementing Coordination Patterns in Idyllic
Idyllic gives you natural hooks for coordination. Systems are components. Fields hold shared state and artifacts. Actions define the delegation and routing surface. The LLM remains a function you call; Idyllic handles the plumbing for persistence and real‑time updates. You can express these patterns with a coordinator system that delegates to specialist systems:Coordinatoris the hub, maintaining high‑level task state (status,currentTask,finalOutput).Researcher,Writer, andReviewerare spokes with narrow responsibilities and prompts.- Handoffs are typed:
researchreturns a structured object,writetakes that object,reviewtakes a string and returns a verdict.
status and currentTask are fields, connected clients can see the coordination process: reviewing vs. writing vs. researching. This is not just nice UI; it also gives you observability into where time is spent and where failures occur.
Routing at the front door is just another system:
- A field that holds the pending request.
- An action the UI calls when a human approves or rejects.
- Coordination code that pauses until one of those actions fires.
Key Takeaways
- Multi‑agent systems are not multiple minds; they are multiple system prompts plus routing applied to the same underlying model. Each “agent” is reconstructed on every call.
- Routing is classification plus dispatch. It directs tasks to specialized components, trading extra complexity and latency for better performance within each specialization.
- Components do not share minds; they share state. Handoffs transfer compressed, structured context between calls, and shared artifacts provide a surface that multiple components can read and modify.
- Coordination is overhead you pay when a single context can no longer carry instructions, tools, and history reliably. You add components when specialization benefits outweigh coordination costs.
- Humans fit into the same patterns as any other component: the system surfaces state, pauses, and resumes based on their input.