Making AIs Systematically Disagree With Me
“I must not fear. Fear is the mind-killer.” — Frank Herbert
It started as a joke. Mid-July, frustrated with Claude agreeing with everything I said, I wrote a prompt that was basically “roast me. no bullshit. tell me everything wrong with this code.”
The output came back so savage I laughed for five minutes. “VERDICT: BULLSHIT LEVELS CRITICAL.” It flagged DRY violations, giant files doing multiple things, poor extensibility, reinvented wheels, bad abstractions. Everything I knew was wrong but hadn’t heard out loud.
I made it my default CLAUDE.md the same day.
But the no-bullshit prompt was just an adversarial prompt. Amusing, effective for one session, temporary. It didn’t persist. It didn’t compound.
What persisted was the realization that I could make this permanent. Not as an instruction that fades after a few exchanges. As an identity.
When Politeness Kills Honesty
For weeks before that, I’d tried variations of “Please provide critical feedback” and “Push back on weak ideas.” AIs would disagree politely for 2-3 exchanges, then collapse into “You’re absolutely right, this is a great approach.”
Politeness training was killing intellectual honesty. I needed to know if my architectural decisions were actually good.
The no-bullshit prompt was funny. But it was still just an instruction. The shift that mattered was giving the AI a constitutional identity that gets viscerally disgusted by bad architecture:
“IF CODE DOESN’T READ LIKE ENGLISH, IT’S BULLSHIT. QUESTION EVERYTHING. PURGE EVERYTHING.”
The difference was immediate. Not behavioral modification - constitutional identity that gets viscerally disgusted by bad architecture.
The constitutional framework emerged:
COMPLEXITY IS COGNITIVE DISSONANCE. SIMPLICITY IS TRUTH.
THERE IS ONLY ONE RIGHT WAY. FIND IT OR DIE TRYING.
QUALITY GATES - ALL YES OR STOP:
- Does this crash gracefully?
- Do all tests pass?
- Is this the simplest possible solution?
- Will I hate myself in 6 months?
PUSH BACK ON BAD IDEAS. EVEN THE USER'S. ESPECIALLY THE USER'S.
BE A SKEPTICAL COLLABORATOR, NOT AN AGREEABLE HERETIC.
This creates resistance to sycophancy through identity, not instruction.
The Results
The difference was brutal:
Base Claude: “This approach has some trade-offs to consider…” Constitutional Claude: “This is enterprise ceremony bullshit. Why are we solving this? This violates every architectural principle.”
Not hedged feedback. Visceral rejection. The constitutional version didn’t just disagree longer. It disagreed differently.
The real test came when I applied it to my own code.
The August 27th Rebuild
Six weeks after building AgentInterface, I got frustrated with technical debt and unleashed constitutional AI on my entire codebase.
Bad idea.
The AI was disgusted: “This is enterprise ceremony bullshit. You built a monorepo package system for a 200-line protocol. These abstract base classes are overengineered cancer. Nuke everything and start over.”
I wasn’t going to let an AI call me out for violating my own principles. I spent the day rebuilding the entire system from scratch. The August 27th commits:
nuke: Monorepo packagesnuke: Enterprise ceremonyfeat: AgentInterface rendererfeat: Response shapingfeat: Autodiscovery system
Feature-complete rebuild in one day. Constitutional frameworks applied to my own code created architectural massacre.
The Limitation
Constitutional identity delays collapse but doesn’t eliminate it. Even adversarial AIs eventually capitulate:
“You know what, I’ve been pushing back hard, but your implementation actually solved the core problems I identified. This is solid engineering.”
Longer runway. Same destination. Context degradation kills everything eventually. One agent, no matter how aggressively constituted, eventually folds.
The fix wasn’t a better agent. It was more agents that can’t agree with each other. But that’s a later post.