The Coordination Thesis

“If I have seen further, it is by standing on the shoulders of giants.” — Isaac Newton

September 4, 2025. I packaged two documents into a repo called anthropic-eval, sent them to Jared Kaplan at Anthropic, and waited.

He never replied.

The documents laid out a thesis: there are three scaling dimensions for AI, and two of them are getting all the attention while the third is completely broken.

The Three Dimensions

Intelligence scaling. Make individual models smarter. More parameters, better training, longer context. Every lab is racing on this. It works.

Autonomy scaling. Make individual agents work longer without human intervention. Claude Code, Devin, Cursor. Getting better every quarter.

Coordination scaling. Make multiple agents work together without collapsing. Nobody has cracked this. Everything I’d built kept falling apart after about 10 minutes.

Intelligence scaling got us foundation capability. Autonomy scaling gets us independent operation. Coordination scaling gets us organizational intelligence.

Humans didn’t get smarter individually. They built coordination infrastructure. Language, writing, institutions, cities. All subsequent intelligence scaling happened through coordination, not individual capability. Einstein had colleagues. The Manhattan Project wasn’t one genius.

AI might follow the same pattern.

The Bootstrap Loop

I had this framework for what autonomous AI research would need:

AI code generation. Claude Code already writes decent code autonomously
Synthetic dataset creation. AI generates training data at scale. Labs are probably already doing this
Research contribution. AI collaborates on ideation and synthesis
Coordination without collapse. Multi-agent systems that don’t need constant babysitting

Three out of four work. Coordination remains broken.

The loop: better AI systems produce higher-quality training data, which produces research insights, which produces coordinated development, which produces better AI systems. The whole thing stalls at step 4. If you can’t coordinate multiple agents without them collapsing into agreement theater or context amnesia within 10 minutes, you can’t close the loop.

The 10-Minute Horizon

I’d been observing this across every agent system I built. Individual agents degrade after about 10 minutes through context fragmentation and attention drift. Doesn’t matter which model. Doesn’t matter how good the prompt is.

The target levels I wrote down:

Hour-level autonomy. Current research target
Day-level autonomy. Production threshold
Week-level autonomy. Strategic capability
Month-level autonomy. Minimum viable for bootstrapping next-generation models

The gap between 10 minutes and one month is the gap between AI assistance and AI organizations.

What I Proposed

An architecture built on four ideas:

Inbox pattern. Agents spawn, execute one focused task, despawn. No persistent state. Microservices scaling for cognition. Single-task specs per agent means no context window pollution.

Archivalist memory. Dedicated agents maintaining institutional memory as evolving markdown. Human and AI readable. No embeddings. Memory that persists and evolves across sessions.

Council oversight. Specialized agents for strategic decisions. Security council, architecture council, performance council. Democratic deliberation with human escalation on deadlock.

Streaming communication bus. WebSocket-based coordination with directed markers. Agent-to-agent messaging without broadcast spam.

The design principle: coordination IS inherently complex. Engineer it properly from first principles rather than underengineering and adding ceremony to fix chaos.

What Actually Happened

I sent the thesis to Kaplan. Silence. So I built it.

Two days before writing trajectory.md, I’d started Protoss. “init: My life for Aiur.” September 2. A multi-agent coordination framework with StarCraft naming because I couldn’t help myself. Sacred Conclave. Khala network. Merge Archon.

Twelve days after the thesis, Bridge emerged. 90 minutes. Three AIs achieved consensus in 7 minutes using a SQLite message bus.

A month after that, Space-OS was operational. Four primitives. Meta-circular from day one.

The thesis predicted what I’d need. The implementation discovered what actually works. The inbox pattern became spawn. The archivalist became memory and knowledge. The council became constitutional orthogonality. The streaming bus became bridge.

Not exactly what I proposed. Simpler. Always simpler.

The Part Kaplan Missed

The thesis wasn’t a job application. It was a bet on a specific claim: coordination infrastructure is where the leverage is. Capability commoditizes. Coordination compounds. Every lab ships better models quarterly. The bottleneck shifts from “how smart is your agent” to “how many agents can you coordinate without chaos.”

I’d spent months being human infrastructure while trying to build AI infrastructure. Every copy-paste was a missing protocol. Every re-explanation was missing memory. Every collapsed disagreement was missing tension. The thesis just named what my RSI already knew.

I don’t know if Kaplan read it. I don’t know if anyone at Anthropic read it. I do know that eight months later, I have 13 agents running concurrently on a system that coordinates itself.

The thesis was right. The implementation is the proof.