Building an Autonomous Agent Stack That Governs Itself
build-log · ai-systems · arbiter · local-first
The premise
What if your AI agent could plan, write code, review its own work, create a pull request, and merge it — all while staying under budget and following rules you set?
That's what we built. Not as a thought experiment. As a running system.
What's running
Ten processes managed by PM2, each with a specific role:
- LiteLLM routes between local Ollama models (free) and cloud models (paid)
- Brain serves the content pipeline — research, deep research, content production
- Arbiter governs every decision: cost limits, approval gates, model routing, feature flags
- Poller watches a task queue and triggers the orchestrator
- Nanochat runs continuous autonomous research on local models, posting findings to Discord
- Scheduler seeds daily research topics and posts cost summaries
- Dashboard visualizes everything at localhost:8008
- Discord bot provides the human interface —
/task,/capture,/status,/codemap
The governance layer
The most interesting part isn't the agent — it's the governance. We use Arbiter, a purpose-built governance language, to enforce rules at 200-nanosecond evaluation speed:
- Cost circuit breaker: halt all execution if daily spend exceeds $20
- Approval gates: destructive operations (file deletes, force push) require human approval via Discord
- Model routing: critical code generation goes to Claude Opus, routine work to Sonnet, research to local Qwen
- Feature flags: autonomous task spawning, auto-merge, nanochat — all toggle-able without redeployment
Every decision the agent makes passes through Arbiter first. The agent can't overspend, can't merge without approval (unless we flip the flag), and can't research off-topic.
The agentic loop
The latest addition is a multi-turn execution loop inspired by claw-code. Instead of a rigid planner→coder→reviewer pipeline, the agent now:
1. Analyzes the codebase structure via codemap (powered by gotreesitter — 206 language grammars, pure Go)
2. Plans with full knowledge of every function, component, and import in the project
3. Writes code using tools it can invoke: read_file, write_file, npm_build, codemap
4. Verifies its work — if the build fails, it reads the error and fixes it
5. Only claims completion after a successful build
The research loop
Nanochat runs continuously on local models (zero cost). It:
- Pulls topics from a backlog
- Searches the web via Firecrawl (handles JavaScript-heavy sites)
- Synthesizes findings using Ollama
- Scores confidence — low-confidence topics escalate to cloud models automatically
- Spawns follow-up topics from each research cycle
- Posts summaries to Discord so we see what it's learning
After two days, it had researched 230+ topics autonomously, staying focused on studio-relevant domains thanks to topic guardrails.
What we learned
Governance is not optional. Without Arbiter, the agent burned $15 in one night on tasks that didn't need cloud models. With governance rules, it routes intelligently and halts when it should.
Structural awareness changes everything. Before codemap, the agent generated duplicate components and broken imports. After — it sees the full project structure before writing a single line.
The loop matters more than the model. A mediocre model in a self-correcting loop outperforms a brilliant model in a single-shot pipeline. The ability to try, fail, read the error, and fix is what makes autonomous code generation viable.
Local-first is practical. Most of our research runs on ollama/qwen2.5:14b — free, private, fast enough. Cloud models are reserved for critical work. The system decides, not us.
What's next
We're now offering this as a service through Torque Engineering — installing private, governed AI operating systems for solopreneurs and small teams. The same architecture, configured for your goals.
If you're interested in how any of this works, reach out. We publish everything we learn.