Most of the systems you have built only execute work when a user or another service calls them.
A request arrives, a function runs, a response goes back, and the system disappears into idleness again. From the CPU’s point of view, your service runs briefly to handle each request and is idle between calls.
Agentic systems tempt you to break this pattern. You start imagining something that keeps working after the user closes the tab, that notices new information and reacts, that makes progress on a goal overnight. The system stops running only on demand and instead continues working in the background without direct user initiation.
That shift carries a risk: it is easy to slip into psychological language, imagining that the system appears to want to do things, to have its own goals, or to decide to act in ways you did not intend. That framing makes autonomy sound like a property of the model’s mind instead of a property of your architecture.
Autonomy is a property of your architecture, not of the model.
An “autonomous” system is one whose control flow no longer starts at a user click or an incoming HTTP request. Something else wakes it up. Something else decides when it should run. Autonomy is not mystical; it is another design choice: which component is allowed to start work, under what conditions, and with what limits.
This matters because once you move the trigger away from a human, a new set of problems appear. How do you distinguish a truly autonomous system from a complicated reactive one? If it runs while you sleep, how do you know what it did? How do you stop it from running away with your resources or your data? And when, inevitably, something breaks, what state is left behind for you to debug?
Autonomy is not about sentience; it is about control of when and how your system starts and stops work. Who owns the “main loop” of your system? What wakes it up? What reins it in?
The rest of this chapter follows that thread. We will treat autonomy as a problem of trigger sources, boundaries, and failure modes, and we will keep the model in its proper place: as one more component inside a system you are responsible for orchestrating.
7.1 Autonomy vs. Reactivity in System Design
[DEMO: Two systems side by side. On the left, a classic API handler that only runs when you click a “Send Request” button. On the right, a scheduled task that wakes up every 10 seconds and updates a counter even if you never touch the UI. A toggle lets you add LLM-powered decision-making to either side so you can see that “intelligence” does not change who owns the trigger.]
Most software you have written behaves like a reflex. Something else pokes it. It responds. Then it waits again.
A web handler does nothing until an HTTP request arrives. A message consumer does nothing until a message appears in the queue. Even a multi-agent system from the previous chapters is usually throttled by a user: the outermost call is “respond to this prompt,” and everything inside is downstream of that single invocation.
Autonomy does not depend on how “smart” the code is; it depends on where control flow begins. A cron job that runs at 3am without a human is autonomous in timing, while a button wired to an LLM remains reactive because a user still triggers it.
The intuition that autonomy is about “wanting things” is not helpful here. We need a mechanical definition.
Autonomy is about control flow origin. Reactive systems run only when external events call into them. Autonomous systems have internal triggers—schedules, stored goals, or event subscriptions—that start work without a user request. A cron job is autonomous in timing but not in decision-making; an intelligent autonomous system combines internal triggers with model-driven choices about what to do when triggered.
Start from the simplest case: a pure reactive handler.
// Reactive: runs only when called from the outside
@action()
async generateSummary(input: string) {
// This code does not run unless some client calls it
return await llm.complete([
{ role: 'system', content: 'Summarize concisely.' },
{ role: 'user', content: input }
]);
}
Nothing here ever happens by itself. The control flow always starts in another process: a browser, another service, a CLI. Your code is a worker, not a boss.
Now change only one thing: who decides when this class gets invoked.
// Autonomous: the runtime calls this method on a schedule
@schedule('every 10 minutes')
async scheduledSummarySweep() {
// This method runs because the scheduler decided it was time
const items = await this.fetchItemsNeedingSummaries();
for (const item of items) {
const summary = await llm.complete([
{ role: 'system', content: 'Summarize concisely.' },
{ role: 'user', content: item.content }
]);
await this.storeSummary(item.id, summary);
}
}
The logic inside scheduledSummarySweep is not special. It could have been the body of an API handler. What changed is the origin of control flow. There is no button that says “run the sweep now.” The runtime’s scheduler decides when to call it.
From an architectural point of view, autonomy means shifting triggers from external callers to internal schedules or event sources.
This leads to a more precise way to classify systems:
- Purely reactive: all entry points are externally triggered (HTTP, RPC, user actions).
- Purely autonomous: all entry points are internally triggered (schedules, internal events).
- Hybrids: some paths are reactive, some are autonomous.
Most useful agentic systems are hybrids. They respond to users when asked and also do work in the background, or on timers, or when internal state crosses thresholds.
The model does not change this classification. A cron job that runs a hard-coded SQL query is autonomous. A human pushing a button that calls an LLM is not. What matters is where the “main loop” lives.
A helpful mental pattern is to draw a box around your system and look at its edges:
- If every path into the box originates in another box, you have a reactive system.
- If some paths originate on a clock, or on stored goals, or on internal event conditions, you have an autonomous system.
Once you view autonomy as a question of trigger sources, cron jobs stop being a different category of thing. A scheduled agent, a watchful monitor, a nightly planner: they are all variations on the same mechanism. The novel ingredient in agentic systems is not how they wake up, but what they can decide during the time they are awake.
7.2 Maintaining Oversight Without Constant Supervision
[DEMO: A small autonomous queue processor that runs every few seconds. The UI never sends it commands, but you can flip to an “Activity Log” tab to see what it has done, a “Status” panel showing current state, and an “Alerts” area that lights up when something goes wrong. A separate toggle pretends you “went to sleep” by hiding the log for 30 seconds; when you return, you can reconstruct everything the system did.]
Once you let your system wake itself up, you lose direct visibility into what happened while you were away.
When a user presses a button and watches a spinner, you get oversight for free. If the response looks wrong, they complain. If the endpoint is down, they refresh and file a ticket. The supervision is continuous because the human is in the loop.
Autonomous behavior cuts that feedback loop. The timer fires at 3am, work happens in the dark, and by the time you open your laptop the effects are already baked into your database, your email outbox, your logs, your bill. Instead of relying on users to notice failures, you need mechanisms that record what ran, whether it succeeded, and how it is behaving now.
You cannot supervise it continuously. You have to instrument it instead.
Oversight in autonomous systems comes from observability, not live supervision. Logging tells you what happened. Status fields expose what is happening now. Alerts notify you when your attention is actually needed. You design explicit observation and intervention points so the system can run unattended most of the time and still surface the right information when you return.
At the smallest scale, oversight is just structured logging.
type LogEntry = {
timestamp: number;
level: 'info' | 'warning' | 'error';
message: string;
details?: Record<string, unknown>;
};
@field log: LogEntry[] = [];
async logActivity(
level: LogEntry['level'],
message: string,
details?: Record<string, unknown>
) {
this.log.push({ timestamp: Date.now(), level, message, details });
// Keep the log bounded so it doesn't grow without limit
if (this.log.length > 500) {
this.log = this.log.slice(-250);
}
}
Every time your autonomous method does something non-trivial, you call logActivity. That gives you a narrative you can replay later:
@schedule('every 5 minutes')
async processQueue() {
await this.logActivity('info', 'Scheduled run started');
const tasks = await this.fetchPendingTasks();
if (tasks.length === 0) {
await this.logActivity('info', 'No tasks to process');
return;
}
for (const task of tasks) {
try {
await this.processTask(task);
await this.logActivity('info', 'Task processed', { taskId: task.id });
} catch (err: any) {
await this.logActivity('error', 'Task processing failed', {
taskId: task.id,
error: String(err?.message ?? err)
});
}
}
await this.logActivity('info', 'Scheduled run completed', {
processed: tasks.length
});
}
Logging gives you history, but it does not tell you whether the system is healthy right now. For that, you make current state first-class.
@field status: {
state: 'idle' | 'running';
lastRunStartedAt: number | null;
lastRunCompletedAt: number | null;
lastRunSucceeded: boolean | null;
consecutiveFailures: number;
queueDepth: number;
} = {
state: 'idle',
lastRunStartedAt: null,
lastRunCompletedAt: null,
lastRunSucceeded: null,
consecutiveFailures: 0,
queueDepth: 0
};
You update this status inside your autonomous methods:
@schedule('every 5 minutes')
async processQueue() {
if (!this.enabled) return;
this.status.state = 'running';
this.status.lastRunStartedAt = Date.now();
this.status.queueDepth = await this.countPendingTasks();
try {
await this.doBoundedQueueWork();
this.status.lastRunSucceeded = true;
this.status.consecutiveFailures = 0;
} catch (err) {
this.status.lastRunSucceeded = false;
this.status.consecutiveFailures += 1;
await this.logActivity('error', 'Queue processing run failed', {
error: String((err as any)?.message ?? err)
});
} finally {
this.status.state = 'idle';
this.status.lastRunCompletedAt = Date.now();
}
}
Any dashboard, CLI, or inspector connected to this system can now ask, without triggering work:
- When did it last run?
- Is it currently running, or stuck?
- Has it been failing repeatedly?
- Is the backlog growing?
The last piece of oversight is alerting. You do not want to poll status manually; you want the system to tell you when something crosses a threshold that warrants attention.
async checkHealth(): Promise<void> {
const issues: string[] = [];
// Too many consecutive failures
if (this.status.consecutiveFailures >= 3) {
issues.push(
`${this.status.consecutiveFailures} consecutive failures in queue processing`
);
}
// No runs for too long
const lastRun = this.status.lastRunStartedAt;
if (lastRun) {
const minutesSinceLastRun = (Date.now() - lastRun) / 60000;
if (minutesSinceLastRun > 60) {
issues.push(
`No queue run started in the last ${Math.round(
minutesSinceLastRun
)} minutes`
);
}
}
if (issues.length === 0) return;
await this.sendAlert({
subject: 'Autonomous queue health issues',
body: issues.join('\n')
});
}
You can run checkHealth reactively (a monitoring service calls it) or autonomously (another schedule inside the same system). Either way, it flips the oversight burden: instead of you watching every execution, the system synthesizes its own health summary and only shouts when something looks wrong.
Three simple mechanisms—logs, status, alerts—change the feel of autonomy. The system may run while you sleep, but you wake up to:
- A scrollable history of what it did.
- A snapshot of how it is doing now.
- A clear signal if something went off the rails.
You are not abdicating control; you are moving from direct supervision to instrumented supervision. That is what makes autonomous behavior operationally tolerable.
7.3 Preventing Runaway Behavior
[DEMO: An autonomous agent that tries to “improve” a text document with an LLM. Without any limits, it keeps re-opening the document every few seconds and making micro-edits forever. A second version has explicit time, iteration, cost, and scope limits. The UI lets you toggle limits on and off to see how quickly the unconstrained version spirals in API calls and junk edits, while the constrained one stops and records why.]
If you give a system permission to start work on its own, you also give it permission to make the same mistake again and again.
An API handler that misbehaves is bounded by user patience. If the response looks obviously wrong, the user stops clicking. If it times out, clients back off. The reflex only fires when someone presses the nerve.
Autonomous paths have no such natural brake. A mis-specified goal can persist in storage and be re-pursued every half hour. A model that never quite decides a task is finished can keep asking for “one more refinement.” A loop that calls an external API based on model output can rack up thousands of dollars in charges before anyone notices.
Goals are serialized task descriptions stored in your database or queue—for example, records that contain an objective, parameters, and status fields that the autonomous loop reads and updates across runs. That makes them powerful and dangerous: you have introduced objectives that persist across executions, but without a human at the steering wheel every time they are acted upon.
Runaway behavior is addressed through architectural limits on execution time, retries, resource consumption, and allowed actions.
Runaway behavior is prevented by explicit boundaries. You design limits on time (how long one execution can run), iterations (how many steps it can take), cost (how many resources it can consume), and scope (what kinds of actions it is allowed to perform). A kill switch gives you a global off button. Goals themselves are just persisted data; the safe behavior comes from the constraints around how and when they are pursued.
Start with time. No autonomous execution path should be allowed to run indefinitely.
async doBoundedWork() {
const deadline = Date.now() + 4 * 60 * 1000; // 4 minutes from now
while (this.hasMoreWork()) {
if (Date.now() > deadline) {
await this.logActivity('warning', 'Time limit reached, stopping early');
break;
}
await this.processNextItem();
}
}
This is deliberately dull. There is no model here. But when you wrap model calls inside this pattern, it becomes protection against hanging requests, long-running reasoning chains, and accidental infinite loops.
Iterations are the next boundary: limit how many times you are willing to try.
async processWithRetries(task: Task) {
const maxAttempts = 5;
let attempts = 0;
while (!task.complete) {
if (attempts >= maxAttempts) {
await this.logActivity('error', 'Max attempts exceeded', {
taskId: task.id,
attempts
});
await this.markTaskStuck(task);
break;
}
attempts += 1;
try {
await this.attemptTaskWithModel(task);
} catch (err) {
await this.logActivity('warning', 'Task attempt failed', {
taskId: task.id,
attempt: attempts,
error: String((err as any)?.message ?? err)
});
}
}
}
If the model never emits an output that satisfies your completion condition, this loop will stop anyway. That protects you from the obvious failure mode: “the agent got obsessed with this task and never moved on.”
Cost boundaries deal with the other dimension: API usage, money, or any scarce resource.
@field budget = {
dailyTokens: 100_000,
usedTokens: 0,
resetAt: Date.now() + 24 * 60 * 60 * 1000
};
async callModel(messages: Message[]): Promise<string> {
// Reset daily budget if needed
if (Date.now() > this.budget.resetAt) {
this.budget.usedTokens = 0;
this.budget.resetAt = Date.now() + 24 * 60 * 60 * 1000;
}
const estimated = this.estimateTokens(messages);
if (this.budget.usedTokens + estimated > this.budget.dailyTokens) {
await this.logActivity('warning', 'Daily token budget exceeded', {
estimated,
used: this.budget.usedTokens,
limit: this.budget.dailyTokens
});
throw new Error('Daily token budget exceeded');
}
const response = await llm.complete(messages);
const actual = this.countTokens(messages, response);
this.budget.usedTokens += actual;
return response;
}
Every autonomous code path that can call the model goes through callModel. That gives you a single choke point where you enforce per-day or per-goal limits without trusting the model to “use resources wisely.”
Scope boundaries are about what the system may do, not how much.
const AUTONOMOUS_PERMISSIONS = {
readData: true,
writeDrafts: true,
sendExternalEmail: false,
deleteRecords: false,
performPurchases: false
} as const;
type ActionType = keyof typeof AUTONOMOUS_PERMISSIONS;
async executeAction(action: { type: ActionType; payload: any }) {
const permitted = AUTONOMOUS_PERMISSIONS[action.type];
if (!permitted) {
await this.logActivity('warning', 'Blocked forbidden autonomous action', {
type: action.type
});
return { blocked: true, reason: 'Action not permitted in autonomous mode' };
}
// Safe to execute
return await this.performAction(action);
}
Even if your model decides that sending a thousand emails is a good idea, this gate can refuse. You shift from “hope the model will be conservative” to “explicitly enumerate the verbs it is allowed to use without a human.”
Finally, you want a way to turn the whole thing off.
@field enabled = true;
@schedule('every 5 minutes')
async autonomousSweep() {
if (!this.enabled) {
await this.logActivity('info', 'Autonomous sweep skipped (disabled)');
return;
}
await this.doBoundedWork();
}
@action()
async disableAutonomy(reason: string) {
this.enabled = false;
await this.logActivity('warning', 'Autonomy disabled', { reason });
}
@action()
async enableAutonomy() {
this.enabled = true;
await this.logActivity('info', 'Autonomy enabled');
}
The kill switch is anticlimactic—a boolean check at the start of your scheduled methods—but operationally critical. When something surprising happens in production, you do not want to hunt through three layers of orchestration to find the right place to stop the loop.
Taken together, these constraints form a fence:
- Time limits bound how long a run can run.
- Iteration limits bound how many steps a run can take.
- Cost limits bound resource consumption over time.
- Scope limits bound what actions are even possible.
- A kill switch lets you disable everything with a single write.
The model remains just a function. The autonomy—the ability to keep coming back to its goals without you calling it—is created by your triggers and your storage. The safety of that autonomy is created by your boundaries.
7.4 Failure Modes in Autonomous Systems
[DEMO: An autonomous worker that processes items from a queue every few seconds. A “Crash Now” button simulates a runtime failure in the middle of processing. The first version loses track of which items were in progress and double-processes some on the next run. The second version uses simple checkpointing and locking, so after a crash it resumes cleanly, skips already-done work, and never overlaps runs.]
Autonomous systems fail in ways that reactive systems rarely do.
If an HTTP handler throws an exception, the user sees an error and retries. If your service is down, clients back off or display a “we’re having issues” banner. The failure is bounded by the request-response cycle.
A scheduled method that crashes has no immediate witness. If a timer fires every five minutes and the work now takes ten, two executions can overlap. If your system is down during a scheduled window, the timer might fire into the void. When it comes back, it might not realize it missed work. An error in a single item can poison every run if you do not isolate it.
Because autonomous code runs without supervision, you should expect crashes, missed timers, overlapping runs, and partial writes to occur over time and design explicit recovery paths for each.
Design for failure explicitly. Autonomous methods should be restartable, interruptible, and non-overlapping. You checkpoint state so crashes leave you in a recoverable position. You use simple locks to prevent concurrent runs from stepping on each other. When reactive and autonomous paths coexist, you decide which one yields to the other instead of letting them race.
Checkpointing is the smallest unit of resilience. Instead of treating a task as “all or nothing,” you record progress as you go.
type TaskStatus = 'pending' | 'in-progress' | 'complete' | 'failed';
type Task = {
id: string;
status: TaskStatus;
checkpoint?: any;
error?: string;
};
async function processTask(task: Task) {
// Mark as in-progress before doing work
task.status = 'in-progress';
task.checkpoint = { startedAt: Date.now() };
await this.saveTask(task);
try {
// Do the actual work
await this.doWork(task);
task.status = 'complete';
task.checkpoint = { completedAt: Date.now() };
} catch (err: any) {
task.status = 'failed';
task.error = String(err?.message ?? err);
}
await this.saveTask(task);
}
If the process dies halfway through doWork, the task is left in a known “in-progress” state with a checkpoint. On the next run, you can decide whether to retry, mark it failed, or resume from checkpoint. The important part is that you do not silently lose or duplicate work.
To handle overlapping executions, you need a simple lock. Schedules do not wait for each other.
@field isRunning = false;
@schedule('every 5 minutes')
async work() {
if (this.isRunning) {
await this.logActivity('warning', 'Skipped run: previous execution still running');
return;
}
this.isRunning = true;
try {
await this.doBoundedWork();
} finally {
this.isRunning = false;
}
}
This pattern ensures that if a run takes longer than expected, the next scheduled trigger gracefully skips instead of starting a second copy that competes for the same resources. In distributed settings you need a shared coordination mechanism—such as a database row used as a lease with expiry timestamps or a dedicated lock service—to ensure only one worker holds the lock at a time and that abandoned locks expire safely. The goal is the same: autonomous triggers should not imply uncontrolled concurrency.
What about missed triggers? Suppose your system is redeploying or your platform is down when a timer would have fired. Autonomous systems need to catch up.
One common pattern is to make the scheduled method idempotent with respect to time windows:
@field lastProcessedAt: number | null = null;
@schedule('every 15 minutes')
async processNewEvents() {
const now = Date.now();
const start = this.lastProcessedAt ?? (now - 60 * 60 * 1000); // Up to 1h catch-up
const end = now;
const events = await this.fetchEventsBetween(start, end);
for (const event of events) {
await this.handleEvent(event);
}
this.lastProcessedAt = end;
}
If you miss three scheduled runs, the next one covers a larger start..end window. You are no longer counting timers; you are counting data ranges. The autonomy is expressed in terms of “have we processed everything up to time T?” rather than “did this particular cron tick execute?”
Autonomous and reactive paths can also collide. For example, a queue item might be processable either by a background sweep or by a direct user request. You need to decide how they interact.
One design is to let reactive work preempt autonomous work:
@field isRunningAutonomously = false;
// Background processing
@schedule('every 5 minutes')
async backgroundProcessQueue() {
if (this.isRunningAutonomously) return;
this.isRunningAutonomously = true;
try {
await this.doBoundedQueueWork();
} finally {
this.isRunningAutonomously = false;
}
}
// User-triggered processing of the same queue
@action()
async processQueueNow() {
// We allow this to run even if a background run is scheduled soon
await this.doBoundedQueueWork();
}
Alternatively, you might want to temporarily disable the schedule when a user is actively working, or vice versa. The important point is that you make the interaction explicit. You do not want a user’s request and a scheduled job both trying to mutate the same artifact at the same time.
Autonomy increases the number of possible failure modes, but you still rely on idempotence, checkpoints, locks, and explicit precedence rules to keep behavior correct. The difference is that you must assume these failure modes will eventually occur while nobody is watching. Resilience moves from “best practice” to “baseline requirement.”
Key Takeaways
Autonomy is not a mystical property of models. It is a choice about where your system’s control flow starts and what keeps it going.
When code only ever runs in response to external calls, you have a reactive system. When some of your entry points are driven by internal schedules, stored goals, or event subscriptions, you have an autonomous system. The LLM inside that system can make more complex decisions about what to do, but it does not change the fundamental question of who decided that now was the time to act.
Because autonomous execution happens without a human’s finger on the button, you must give the system new affordances:
- Observability, so you can reconstruct what happened and see what is happening now.
- Explicit boundaries on time, iterations, cost, and scope, so “run without being asked” does not become “runaway.”
- Kill switches and locks, so you can stop behavior quickly and prevent overlapping runs.
- Checkpointing and catch-up logic, so crashes, delays, and missed windows leave you in a recoverable state instead of a corrupted one.
Designing autonomy requires explicit decisions about who owns the main loop, how it operates, and when it must yield control. You decide who owns the loop, how it is allowed to operate, and when it must yield. The model remains a component inside that loop, not the source of it.
Bridge to Chapter 8
An autonomous system that wakes up on time, stays within its boundaries, and survives failures is operational, but that does not mean it is doing the right work.
If you let an agent write drafts overnight, how do you measure whether those drafts are any good? If a background planner keeps refining a roadmap, how do you know whether the revisions are improvements or noise? Autonomy buys you throughput and responsiveness, but without evaluation you have no notion of quality beyond “it did something.”
The next chapter introduces evaluation: how to define what “good” looks like, how to build test sets and metrics, and how to use both automated checks and model-based judgments to keep your systems honest. Autonomy decides when work happens. Evaluation tells you whether that work was worth doing.