The Six Layers of an Autonomous Coding Agent
agents · architecture · series
Where we've landed
Most of what gets written about AI agents right now falls into two categories. There's the demo — a developer posts a video of an agent writing a React component and calls it the future. And there's the skepticism — a researcher explains why large language models hallucinate and calls the whole enterprise a mirage.
Neither captures what it's actually like to build and run these systems in production. The demos skip past everything hard. The skepticism doesn't engage with the work that's actually solving those hard problems.
We've been building autonomous coding and research agents for months now. They run continuously, governed by declarative rules, backed by local infrastructure. They do real work — shipping code, writing documentation, conducting research, flagging decisions that need human review. They also fail in interesting ways, and those failures have been the most instructive part of the project.
Somewhere in the middle of all this, a pattern emerged. Every serious agent system we've built, studied, or tried to fix has the same six-layer architecture underneath. Sometimes the layers are explicit and well-designed. More often they're implicit — which is usually when things go wrong.
We're going to write them up, one at a time, over the next several weeks.
The layers
The names aren't novel. The layering isn't a formal specification. But the pattern has held up consistently enough across different agent systems that we think it's worth documenting.
Perception — how the agent understands the code, data, and environment it's working in. Not just what it reads, but how it structures what it reads. The difference between an agent that sees a file as 400 lines of text and one that sees it as a call graph, a dependency set, and a list of exported symbols is the difference between an agent that guesses and one that reasons.
Planning — how complex work gets broken into steps the agent can actually execute. Where decomposition succeeds, where it breaks down, and what happens when the plan meets reality. The most common failure mode in autonomous systems isn't bad execution — it's good execution of a plan that was wrong.
Execution — how the agent produces output that's not just plausible but correct. Generating code is easy. Generating code that compiles, that doesn't break existing tests, and that actually solves the problem — that's where the real engineering lives. Structural verification turns out to matter more than most practitioners assume.
Memory — how the agent retains knowledge across sessions. Most agents today start every conversation from scratch. That's not a constraint of the technology; it's a constraint of how the systems are built. Persistent semantic memory, done right, changes what agents can do.
Governance — how the system enforces rules about what the agent is allowed to do. Cost caps, approval gates, routing policies, escalation thresholds. This is the layer that keeps autonomous systems from drifting, over-spending, or making decisions that should have gone to a human.
Trust — how the agent knows when to proceed versus when to escalate. Confidence calibration, self-assessment, the economics of agent accuracy. This is the least developed layer in most systems, and probably the one that matters most for broad deployment.
Why write this
Two reasons.
First, we've accumulated enough research and operational experience to say something substantive about each of these layers. Not speculative, not promotional — actual observations about what works, what doesn't, and why. Writing it down forces the thinking to sharpen.
Second, the field is moving fast enough that documentation lags behind practice. A lot of the best thinking about agent architecture lives in commit messages, in internal Slack channels, in conversations between practitioners. Some of it deserves to be more public.
Each piece in the series will stand alone. Read together, they sketch a full architecture. You can skip to the layer that's giving you trouble, or read from the top and watch the pattern build.
The shape of the series
- Part 1: Perception — structural code understanding and why it changes agent capability
- Part 2: Planning — task decomposition, where it fails, and how to design around the failures
- Part 3: Execution — code generation with structural verification
- Part 4: Memory — persistent semantic context and cross-session learning
- Part 5: Governance — declarative rules for autonomous behavior
- Part 6: Trust — confidence calibration and the economics of agent accuracy
Part 1 is up now. The rest land roughly every week or two.
The stack underneath
Each layer has become a distinct category of tooling over the past year. Structural code intelligence. Local-first vector search. Portable model inference. Declarative governance engines. Each category has emerging reference implementations, and a handful of them come from the same author — a pure-Go ecosystem we'll reference throughout the series.
We're not writing about the tools themselves. We're writing about the architecture. But the choice of implementation in each category matters, because it shapes what's possible at the layer above. When we describe the perception layer, we'll describe how structural code intelligence is delivered in practice, and point to the implementation we run in production. Same for memory, governance, and the rest.
The infrastructure is downstream of the thesis. Name the layers, specify what each one needs, and the tooling picks itself.
A note on sources
Our system ran 348 research briefs on related topics before we started writing this series. That research isn't a substitute for engineering judgment — it's a foundation for it. When a claim needs support, it's supported by something we actually read or built, not something we recalled approximately. When a claim is a strong opinion, we'll mark it as one.
We could train a domain-specific embedding model in 27 minutes and have it serving continuously since. The tools got good enough that writing this series feels possible.
---
Get the rest in your inbox
Each part lands roughly every week or two. Subscribe and we'll send new installments the day they ship, plus occasional notes on what we're learning from running these systems in production. No promotional filler.
Need custom help designing your stack?
ResonanceWorks works with founders and small teams on architecture, governance, and private AI system design. We take a small number of engagements at a time and work closely with founders and technical leads. Talk to Consulting.
Want a ready-made local-first system instead?
Torque Engineering installs a proven capture, vault, and routing system for independent operators. Explore Torque Engineering.