What Continuous Research Looks Like Now
research · infrastructure · local-first · mll
The problem with research at small scale
Most small studios don't do sustained research. They do bursts — a few hours before a pitch, a weekend deep-dive into a new technology, the occasional bookmark that never gets revisited. The knowledge stays in people's heads or scattered across tabs and notes.
We wanted something different. A system that researches continuously, synthesizes what it finds, remembers what it's already covered, and keeps going — without someone sitting at a keyboard.
We've been running that system for a few weeks now. It pulls from the open web, synthesizes through a local language model, stores its output in a vector database, and governs its own behavior through a rules engine. It runs on a single machine. The total infrastructure cost after hardware is effectively zero for the local components.
What changed recently is the quality of the semantic layer underneath it — and that's worth talking about.
What the system does
Every two minutes, the system checks a backlog of research topics. If there's a topic waiting, it searches the web for sources, extracts the content, feeds it through a local 14B parameter model for synthesis, and produces a structured research brief. The brief gets scored for confidence, stored, and indexed.
When the backlog empties, the system generates its own next questions — using what it's already learned to find gaps, deeper angles, and adjacent areas worth investigating. It does this continuously, governed by daily caps on cycles, follow-ups, and API spend.
The output is a growing library of research briefs that we can search by meaning, not just keywords.
The semantic layer that changed everything
For the first few weeks, deduplication was string matching. "Sovereign AI infrastructure" and "local-first AI for independent studios" were treated as completely different topics. The system would research the same concept from slightly different angles over and over, burning cycles on territory it had already covered.
We replaced the string-matching index with a vector database backed by a custom embedding model. The model was trained on our own research corpus — the briefs the system had already produced. It learned our vocabulary, our domain, our conceptual neighborhood.
The difference was immediate. A query for "local-first AI for small teams" now returns "sovereign AI infrastructure for independent studios" at 0.98 similarity. Zero overlapping words, near-perfect semantic match. Topics that would have slipped past string matching get caught and skipped.
The embedding model is small — trained on 45,000 words, 5,979 tokens in its vocabulary, 256 dimensions. It fits in memory alongside everything else. But it knows our domain better than any general-purpose model would, because it was trained on exactly the material it needs to understand.
What makes this possible
The embedding model, the vector database, and the quantization layer underneath them all come from the same author — odvcencio. The entire stack is pure Go. No Python runtime, no C dependencies, no external services.
This matters more than it might seem. When your embedding model is a single compiled artifact that loads with one function call, and your vector database is a single binary that stores 100,000 vectors in 14 megabytes, you stop thinking of semantic search as a service you connect to. It becomes a local primitive — like file I/O or string matching. Something you just use.
The model artifact format — .mll — packages the model definition, weights, tokenizer, and execution plan into one file. You compile it once and it runs on CUDA, Metal, or CPU without rebuilding. We trained ours on a Mac in 27 minutes. It's been serving embeddings continuously since, with no degradation and no maintenance.
The whole stack is purpose-built for one thing: portable, self-contained inference with no external dependencies. The vector database uses deterministic quantization for CRDT replication — two replicas converge to byte-identical state without coordination. The compiler optimizes for inference specifically, not training. It's opinionated software, and the opinions are consistent from top to bottom.
What this enables next
We're using the stack for research dedup and semantic search today. But the architecture points toward things we haven't built yet:
Semantic memory for agents. Our coding agents currently start each session with no memory of previous work. A local vector database with a domain-specific embedding model could give them persistent semantic memory — the ability to recall not just what files they changed, but what problems they solved and what approaches worked.
Federated knowledge across devices. The CRDT replication means you could run instances on multiple machines that sync their knowledge bases automatically. A research brief produced on one machine becomes searchable on another, with no central server. Distributed knowledge accumulation.
Domain-specific retrieval that actually works. General-purpose embeddings treat every domain the same. A model trained on your specific corpus understands the relationships between concepts in your field. "Arbiter governance" and "declarative policy enforcement" land close together because they should — not because a general model happened to see them in similar contexts during pre-training.
Inference at the edge without compromise. The .mll format compiles to WASM. A scoring model or embedding model could run in a browser, on a phone, on a Raspberry Pi. The same artifact, the same behavior, no server required. This isn't theoretical — the compilation targets are already in the toolchain.
What we've learned running it
Three hundred research cycles a day, governed by rules that cap spend, enforce focus, and prevent the system from chasing its own tail. The governance layer turned out to be as important as the research loop itself — without it, the system reliably drifts into increasingly abstract territory, generating follow-up questions about follow-up questions until the output is meaningless.
The semantic dedup catches roughly a third of what the old string-matching system would have let through. That's a third fewer wasted cycles, a third less redundant output, a third more coverage of genuinely new ground.
The research quality isn't academic-grade. It's practitioner-grade — the kind of synthesis you'd get from a smart colleague who spent an hour reading about a topic and gave you the summary. Multiply that by 300 cycles a day, filtered through semantic dedup and focus enforcement, and you get a knowledge base that grows in a direction you chose, at a pace no individual could sustain.
The infrastructure underneath all of this runs locally, fits in memory, and hasn't needed maintenance since we deployed it. For a system that runs 300 cycles a day, that's the only thing that matters.