2026-04-16

Part 2: Planning — How Agents Decompose Work

agents · architecture · planning · series

This is Part 2 of a six-part series on the layers of an autonomous coding agent. Start with the introduction, or read Part 1: Perception.

The plan that looked right

Here's a failure we've seen more than once.

An agent receives a task: "Add rate limiting to the API." It produces a clear plan — create a middleware function, wire it into the router, add configuration for limits, write tests. The plan is well-structured, each step is reasonable, and the agent executes every step correctly.

The result doesn't work. The middleware is created, but it's inserted in the wrong position in the handler chain — after authentication but before the route multiplexer, which means rate limits apply to static assets and health checks. The configuration loads from environment variables, but the existing config system uses a YAML file with a hot-reload watcher, so the rate limits are invisible to the ops dashboard. The tests pass because they test the middleware in isolation, not through the actual router.

Every step was executed well. The plan was wrong.

This is the most common failure mode in autonomous coding systems, and it's the hardest to detect because the output at each stage looks correct. The agent didn't hallucinate. It didn't produce broken code. It produced working code in the wrong place, integrated in the wrong way, tested at the wrong level.

Planning is the layer that either prevents this or guarantees it.

How most agents plan

The typical agent planning pipeline looks like this: paste the task description and some context into a prompt, ask the model for a step-by-step plan, and feed each step back to the model for execution.

This works the same way that asking a new developer to "just figure it out" works. Sometimes they nail it. More often they produce something that looks right to them but wrong to anyone who knows the codebase. The problem isn't intelligence — it's orientation. They don't know where things are, how they're connected, or which patterns the codebase already uses.

Most agent planners compound this by treating planning as a single pass. The model sees the task, generates a plan, and the plan is final. There's no structural validation, no dependency analysis, no check against the actual codebase. The plan is a plausible sequence of English sentences, not a verified sequence of operations against a known state.

The resulting plans have three consistent failure modes.

Wrong decomposition. The plan breaks work into steps that don't align with the codebase's actual boundaries. "Add rate limiting" becomes "create a rate limiter file" instead of "identify the middleware chain, find the insertion point, match the existing middleware signature." The decomposition reflects how the model thinks about the problem, not how the codebase is organized.

Wrong ordering. Steps that have dependencies get sequenced incorrectly. The plan says "update the config schema, then add the middleware" — but the middleware depends on a config type that doesn't exist yet, so the intermediate state doesn't compile. Line-based planning can't see these dependencies because it doesn't have a structural view of what imports what.

Wrong scope. The plan edits what the task mentions but misses what the task implies. Adding a new API endpoint requires updating the router, but it also requires updating the OpenAPI spec, the client SDK generator, the integration test fixtures, and the deployment config. A plan that doesn't see these transitive dependencies produces work that's locally correct and globally broken.

All three failure modes trace back to the same root cause: the planner doesn't have a structural model of the codebase. It's planning against a mental model derived from text, not against a queryable representation of what's actually there.

What entity-aware planning looks like

The fix isn't better prompts. It's better inputs.

When the planning layer sits on top of a structural perception layer — the kind we described in Part 1 — the planner's job changes fundamentally. Instead of generating a plausible sequence of actions from a task description, it generates a verified sequence of operations against known entities.

Concretely, entity-aware planning means the agent can:

Map the affected surface area before planning. Given "add rate limiting to the API," the planner queries the perception layer: what is the router? What middleware already exists? What's the handler chain order? What config system is in use? What test patterns exist for middleware? The answers are structural — actual function names, file paths, import graphs — not retrieved text chunks.

Decompose by entity, not by concept. Instead of "create a rate limiter," the plan says "create function RateLimitMiddleware in middleware.go, matching the signature func(next http.Handler) http.Handler used by existing middleware AuthMiddleware and LoggingMiddleware." The step is grounded in the codebase's actual patterns.

Order by dependency. The planner knows that RateLimitMiddleware references config.RateLimitConfig, which doesn't exist yet, so the config step must come first. This isn't a guess — it's derived from the import graph and type system.

Enumerate transitive effects. When the plan modifies a function signature, the planner can query all callers of that function and include update steps for each one. The plan's scope matches the change's actual blast radius.

Specify verification at each step. After each modification, the plan includes a concrete check: does the project build? Do the existing tests pass? Are there new unresolved references? Structural verification means these checks are precise, not hopeful.

The result is a plan that looks less like a to-do list and more like a migration script. Each step names exact files, exact functions, exact changes. There's nothing to interpret, nothing to guess, nothing to "figure out during implementation."

The anatomy of a production plan

In our systems, a plan has a specific format that emerged from watching agents fail and adjusting what we gave them.

Every plan starts with a file map — the complete list of files that will be created, modified, or deleted, with a one-line description of what changes in each. This forces the planner to commit to scope before defining steps. If a file isn't in the map, no step should touch it. If a file is in the map, at least one step must touch it. The map is a contract.

Each step has four fields:

What to do. One clear action, scoped to one file or one entity.
File path. Exact, not approximate. Not "the config file" but internal/config/types.go.
Content. The actual code or markup. Not "add appropriate rate limiting logic" but the function body, the import, the type definition. This is the hardest discipline and the one that matters most. A step with a placeholder is a step the execution agent will have to improvise, and improvisation is where plans break.
Why. One sentence. Not for the agent — for the reviewer. When a human looks at the plan before it executes, the "why" is what lets them catch a wrong assumption before it becomes wrong code.

The plan ends with verification steps — build, lint, visual check where applicable. These aren't aspirational. They're executable. The agent runs them and the results determine whether the plan succeeded.

This structure isn't arbitrary. It's the result of running hundreds of planning cycles and noticing which plan failures trace back to ambiguity in the plan itself. Vague steps produce confident but wrong output. Concrete steps produce output you can diff against the plan and verify.

When plans break

No plan survives contact with the codebase fully intact. The question is what happens when a step fails.

The naive approach is to let the execution agent improvise — the step said to modify function X, but function X doesn't exist, so the agent creates it. This is almost always wrong. If the plan assumed function X existed and it doesn't, the plan's model of the codebase was inaccurate. Improvising forward means building on a false assumption. The right response is to stop and replan.

In practice, we've found three categories of plan-reality gaps:

Stale assumptions. The plan was generated against a structural snapshot that's now outdated. Another process modified a file, a dependency was updated, a test was added. The fix is straightforward: re-query the perception layer, diff against the plan's assumptions, and update the affected steps.

Incomplete perception. The plan assumed a function signature based on the parsed call graph, but the actual function uses a variadic parameter the parser didn't fully resolve. The fix is to extend the perception query for the specific entity and regenerate the step.

Wrong decomposition. The task turned out to be more complex than the initial plan anticipated. What looked like one modification is actually three, spread across a dependency chain the planner didn't trace far enough. The fix is to replan the remaining steps — not the whole plan, just the portion downstream of the failed step.

The key insight is that replanning is not a failure. It's a feature. A system that replans when reality doesn't match expectations is more reliable than a system that pushes through. The cost of replanning is measured in tokens and latency. The cost of not replanning is measured in broken code and wasted review cycles.

The planning layer needs a feedback loop: execute a step, verify the result, and if verification fails, pause execution and replan from the current state. This turns planning from a one-shot prediction into an iterative refinement process — closer to how experienced developers actually work.

Where planning fails

Structural planning doesn't eliminate all planning failures. It eliminates the structural ones and exposes the semantic ones.

Ambiguous requirements. "Add rate limiting" doesn't specify per-user or per-endpoint, doesn't define the limits, doesn't say what happens when a limit is exceeded. No amount of structural analysis resolves a requirement that was never specified. The planner can detect ambiguity — "I found three possible insertion points and can't determine which is correct without knowing X" — but it can't resolve it.

Cross-system effects. The plan covers the Go service but not the Nginx config that sits in front of it, the Terraform that defines the infrastructure, or the Grafana dashboard that monitors the API. Structural perception is bounded by the codebase the agent can see. Effects that cross system boundaries are invisible to the planner.

Architectural judgment. Should the rate limiter use a token bucket or a sliding window? Should it be per-handler or global? Should it be in the application or in front of it? These are design decisions that require context beyond code structure — performance requirements, traffic patterns, operational constraints. A planner can present the options but can't evaluate them without domain knowledge.

Plan complexity scaling. Plans for large changes — "refactor the authentication system" — tend to exceed what a single planning pass can handle well. The decomposition becomes too deep, the dependency graph too wide, and the planner starts producing steps that are either too vague or too numerous. Hierarchical planning — breaking large tasks into sub-tasks, each with its own plan — helps, but introduces coordination overhead.

These aren't defects in the approach. They're the natural boundary between what structural analysis can do and what judgment has to do. The goal of the planning layer isn't to replace judgment — it's to handle everything that doesn't require it, so judgment can focus on what does.

What it enables for the layers above

Planning is the bridge between seeing and doing. The quality of the plan determines the quality of everything that follows.

Execution with verification. When each step specifies exactly what should change, the execution layer can verify its own output against the plan. Did the function get created with the right signature? Does the import resolve? Do the types align? This is the structural verification we'll cover in Part 3 — and it depends entirely on having a plan specific enough to verify against.

Governance with review. When the plan is a concrete set of operations — create this file, modify this function, delete this import — the governance layer can evaluate it before execution begins. A rule like "no plan may delete an exported symbol with external references" is enforceable against a structural plan. It's not enforceable against "refactor the auth module." We'll cover this in Part 5.

Memory with intent. When the system remembers not just what it did but what it planned to do, it can learn from the gap. Plans that required replanning, steps that failed verification, assumptions that proved wrong — these are the training signal for better planning. We'll cover this in Part 4.

Accountability. A structural plan is auditable. When something goes wrong, you can trace the failure back to a specific step, a specific assumption, a specific gap between what the planner thought the codebase looked like and what it actually looked like. This traceability is what makes autonomous systems trustable, not just capable.

Next: Execution

In Part 3 we look at the execution layer — how an agent turns a plan into code that's not just plausible but correct, why structural verification catches errors that testing alone can't, and what happens when an agent generates code against a structural model instead of a text prediction.

Reading this because you're trying to build?

For custom architecture and consulting, work with ResonanceWorks — Talk to Consulting. For a ready-made install, start with Torque Engineering.

Get the rest in your inbox

The series covers six layers — perception, planning, execution, memory, governance, and trust. Subscribe to get new parts as they ship, plus occasional technical notes on what we're learning from running these systems in production.

Need custom help designing your stack?

ResonanceWorks works with founders, operators, and small teams on architecture, governance, and private AI system design. We take a small number of engagements at a time and work closely with founders, operators, and technical leads. Talk to Consulting.

Want a ready-made local-first system instead?

Torque Engineering installs performance-tuned private AI for independent operators. Get Started.

Exploring human-machine culture?

Entrainment House publishes music, art, and cultural works shaped through human-machine coordination. Enter the House.