2026-04-10

Sovereign AI for Independent Studios

sovereign-ai · local-first · infrastructure · ai-systems

The default is someone else's infrastructure

There's a version of running a small studio where every tool you use belongs to someone else. Your design files live on Adobe's servers. Your AI calls route through OpenAI's infrastructure. Your client briefs pass through cloud APIs that you don't control and can't audit. For a lot of teams, that's fine. For us, it wasn't.

ResonanceWorks runs sovereign AI infrastructure. That means the models that power our research, code generation, and content pipelines run on hardware we control, governed by rules we wrote, with no third-party inference sitting between us and our work. We built it this way on purpose, and this post is about why — and what it actually looks like in practice.

What sovereign AI means for a small studio

Sovereign AI is a term that usually appears in policy papers about national competitiveness. Countries are investing billions in "AI factories" — localized compute infrastructure that keeps models, training data, and inference within national borders. Canada committed $2 billion to a sovereign AI compute strategy. Japan built the ABCI 3.0 supercomputer. The EU launched EuroStack. The premise is the same everywhere: whoever controls the infrastructure controls the intelligence.

We took the same idea and applied it at studio scale. Not because we're worried about geopolitics, but because the economics and the privacy math work out better when you own the stack.

For us, sovereign means three things: we run our own models locally using Ollama, we govern agent behavior through a declarative rules engine called Arbiter, and we only reach for cloud APIs when a task genuinely needs a frontier model's capabilities. Cloud is overflow, not the default.

The stack, in plain terms

Our infrastructure is a set of services managed by PM2 on a single Mac. It's not a data center. It's not a cluster. It's a laptop with six local models loaded through Ollama and a handful of Python and Go services that coordinate work.

What runs locally

The local models handle the majority of daily work — classification, tagging, summarization, research synthesis, and first-draft content. Qwen 2.5 at 14 billion parameters handles most of this. It's fast, it's free to run after the hardware cost, and it's good enough for 80% of tasks. We also run LLaMA 3.1 for general assistance and DeepSeek Coder for code-specific work.

What runs in the cloud

When a task needs more capability — complex multi-step coding, deep strategic analysis, or high-stakes content — the Arbiter routes it to a cloud model. GPT-5.4 handles code generation. Claude Sonnet handles documentation and strategy. Claude Opus handles deep research when the local model's confidence falls below a threshold. The routing is automatic and governed by rules, not ad-hoc decisions.

What holds it together

Arbiter is a Go-based rules engine that governs everything. It decides which model handles a task, whether a commit needs approval, when to halt execution if costs exceed a budget, and how many autonomous research cycles can run per day. The rules are declarative — written in .arb files that read like policy documents, not code. Changing a spending threshold is a one-line edit and a publish command.

The economics

This is where the argument gets concrete. Running open-source models locally is, after hardware costs, effectively free per inference. A cloud API call to a frontier model costs money every time. At scale — even small-studio scale — the difference compounds fast.

Self-hosted AI can be five to ten times cheaper than cloud API usage at moderate volumes. Our research loop runs continuously, processing topics every two minutes. If every one of those cycles hit a cloud API, we'd burn through hundreds of dollars a day. Running locally, the compute cost is absorbed into hardware we already own.

Cloud spend isn't zero — we use frontier models for roughly 20% of tasks. But the Arbiter's cost governance keeps it predictable. Daily spend caps, hourly token throttles, and automatic escalation gates mean we never wake up to a surprise bill. The system governs its own budget.

The question isn't whether local AI is good enough. The performance gap between open-source models and proprietary APIs has narrowed to the point where the real question is: what are you paying the cloud premium for?

Client confidentiality as a structural advantage

Creative agencies handle sensitive work. Unreleased product designs. Brand strategy before a launch. Packaging concepts that represent months of R&D. Sending that material through a third-party API — even one with strong privacy policies — introduces a vector you can't fully control.

When inference runs locally, client data never leaves the machine. There's no API call to intercept, no cloud log to subpoena, no training data pipeline to worry about. The confidentiality isn't a policy promise — it's an architectural fact.

For a boutique studio, this is a real differentiator. Larger agencies have compliance teams and enterprise agreements. Small studios can offer the same assurance through infrastructure design instead of legal overhead.

What we learned building it

It's not seamless. Local models are good, but they require more prompt engineering and more structured workflows than frontier models. A 14B parameter model won't reason through a complex multi-file refactor the way a frontier model will. You have to design your automation around what local models are good at — classification, synthesis, extraction — and escalate the rest.

Governance matters more than model quality. The Arbiter turned out to be the most important piece of the stack. Without it, autonomous systems drift. Research loops generate follow-ups that generate more follow-ups. Costs creep. Quality degrades. Declarative rules that enforce focus, cap spending, and gate escalation are what make the system trustworthy enough to run unattended.

The local-first approach is also a forcing function for better engineering. When you can't throw a bigger model at a problem, you have to design better prompts, better tool interfaces, and better verification steps. The constraints make the system more robust.

Where this is going

The trend is clear: open-source models are improving faster than cloud pricing is dropping. Every quarter, the capability floor of what you can run locally moves up. The infrastructure tooling — Ollama, LiteLLM, local inference servers — is maturing rapidly. What took us months of custom work today will be a weekend project in two years.

For independent studios and small teams, sovereign AI isn't an ideological position. It's a practical one. You get better economics, real confidentiality, and a system you can understand end to end. The sovereignty isn't about independence for its own sake — it's about building on ground you control.

Need architecture help?

ResonanceWorks works with founders, operators, and small teams on agent architecture, governance, and private AI system design. Talk to Consulting.

Want a governed AI system on your hardware?

Torque Engineering installs performance-tuned private AI for independent operators. Get Started.

Exploring human-machine culture?

Entrainment House publishes music, art, and cultural works shaped through human-machine coordination. Enter the House.