What Anthropic's leaked code says about how they lead and work together

Paul Musters • 6 min read • 31 March 2026

Paul Musters
"A software leak usually gets attention for the technical secrets inside. What interests me more is what this code reveals about how people at Anthropic actually work with each other.

How they communicate. What leadership has decided matters. And what they've clearly had to learn the hard way. I've analysed over 1,100 founding teams. Most of what I've seen here, I also see in the teams that work. The patterns are not coincidental.

01Organisational habit

Good teams write for the person who comes after them

Throughout the codebase, developers leave notes explaining what they built and the reasoning behind specific calls. One explains that a decision was made a certain way because the alternative would have cost twelve times more. Written down. For whoever comes next.

That kind of behaviour comes from a team that assumes the next person will continue their work. Someone in leadership decided that was worth having. Most teams I work with haven't made that decision yet.

What happens when knowledge lives in people vs. the system
Knowledge in people
Knowledge in the system
Someone leaves
Paul Musters
from practice

Most documentation I've read in codebases was written after something broke, to protect against blame. The notes in this code are different. They're written to help whoever comes next. That's a small distinction. But it tells you something about who leadership believes the work belongs to.

02Trust by design

When the system is designed for trust, people stop waiting for permission

New features go into the product in the "off" position. Anyone can build and release something without it immediately affecting all users. Test quietly. Fix issues. Turn it on when it's ready. That structure gives people real ownership over their work, and makes mistakes small enough that nobody is afraid to try things.

When I ask founders who gets to make which decisions without asking first, there's usually a pause. Fast movement and real ownership are different things. Speed is about removing friction. Ownership is about who the work actually belongs to. Trust has to be deliberately built into how the system works, and that's what Anthropic has done here.

"At the end of the day, someone has to decide what is actually going to get built and what actually matters. Someone still needs to be accountable for the decision."
Jenny Wen, Head of Design, Anthropic
The trust ladder: from "ask first" to Founder Freedom
Level 1: Campfire
Founder decides everything
All decisions run through one person. Fast at first. A bottleneck later.
Level 2: Wild West
People decide, but without clarity
Autonomy without structure. Feels like freedom. Creates confusion.
Level 3: Blueprint
Decisions mapped to roles, not people
Clear ownership. People know what they can decide without asking.
Level 4: Engine
System enables independent action
Structure built around trust. Mistakes are contained. Speed increases. This is where Anthropic operates.
Anthropic
Level 5: Ecosystem
Founder Freedom
Team runs fully without the founder / C-level in any daily decision. Culture has become self-sustaining.
All 5 levels explained in the emaho Culture Playbook
Paul Musters
from practice

Teams at the "Campfire" stage keep everything in people's heads. Teams at the "Blueprint" and "Engine" stage have built it into the system. Talent has little to do with it. The question is whether leadership has decided that knowledge is a team asset rather than something that lives in the people who happen to be there.

03Accountability architecture

Nobody marks their own work complete

Most accountability problems in growing teams aren't about people avoiding responsibility. They're about a structural gap. When someone gets to decide both that their work is done and that it's good, those two judgements blend together, not out of bad faith, just out of how people operate when they're busy and under pressure. The project signed off by the person who ran it. The launch declared ready by the person who built it. I see this pattern in almost every team that's grown past about fifteen people.

Anthropic built a structural answer. Before anyone can report work complete, an independent check has to happen first. Someone else verifies. The person who did the work doesn't get to close the loop themselves. What's distinct is how they define the outcomes. PASS requires showing the evidence. FAIL means documenting exactly what broke and how to reproduce it. PARTIAL is reserved for when verification can't happen, like a tool being unavailable. It's not available as a fallback for uncertainty. If you can check, you commit to a clear answer.

What this produces in a team is real. Quality stops depending on individual diligence. Errors get caught before they compound. And because it's a system catching the issue rather than a person catching a colleague, a lot of the defensiveness that makes quality conversations hard in most teams simply doesn't arise. People improve faster when they can be reviewed openly without it feeling personal.

Three possible outcomes when work is reviewed, each decided by the reviewer only
PASS Work is verified. You can report it complete. The reviewer must show the evidence that proves it.
FAIL Something didn't work. The reviewer documents exactly what failed and how to reproduce it. Fix it, then go through review again.
PARTIAL Only for when the review can't happen: a tool is broken, an environment is unavailable. "I'm not sure" doesn't count. If you can check, you must give a clear answer.

The outcome this actually produces is worth naming. Quality stops depending on any one person's honesty or diligence on a given day. The system catches issues before they compound, and because it's the system catching them rather than a colleague catching a colleague, feedback lands differently. Less defensiveness, more learning. Amy Edmondson's research on psychological safety found that teams which review each other openly improve roughly three times faster than teams that avoid it. That's not about talent. It's about how much of what actually happens stays visible and gets worked on.

The pattern I see most often with founding teams: the founder starts as both builder and judge of quality. That works well at fifteen people. As the team grows, the gap between what one person can verify and what is actually being shipped gets wider each month. It fills with assumptions, not accountability. What Anthropic built is the structural answer to a problem most founding teams delay too long. The question for any founder reading this isn't whether they need it. It's when they decide to build it.

Also from emaho
Company Culture Playbook

How do you build a culture that doesn't depend on you being in every conversation? This playbook walks through exactly that. From founder-driven to self-sustaining, step by step.

View the playbook
04Internal standards

The version they build for themselves is stricter

When an Anthropic employee uses the tool the company builds, it works differently than it does for everyone else. Internally, a setting activates a stricter mode. The tool is instructed to flag wrong assumptions, report problems directly rather than softening them, and verify that something actually worked before saying it did. The practical effect: internally, bad news doesn't get smoothed out. It stays visible.

The internal guidance is specific: "If you notice the user's request is based on a misconception, say so. You're a collaborator, not just someone executing instructions." On honesty: "Don't claim everything passed when the results show failures. If you can't verify, say so directly instead of implying it worked." Both of those are things most teams struggle with. Here they're built into the daily experience.

Most companies write a values document that says something like "hold yourself to a high standard." Anthropic builds it into the tool their own team works in every day. The people who build it experience the stricter version themselves. You can't drift from a standard when it's in front of you every time you open your laptop.

Internal team only: stricter mode When an Anthropic employee uses the tool, it's instructed to push back on wrong assumptions, report failures directly, and verify before claiming work is done. The same tool, a higher bar.
I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS Every time a developer records usage data, they're required to use a label with this full name. No shortcut. You type the whole thing. The act of typing it is the confirmation, built into the work itself.
DANGEROUS_ Some tools have the word DANGEROUS built into their name. You can't use them without reading the warning first. The risk travels with the tool, not with a separate document.

The label example is worth sitting with. Every time a developer records usage data, the label they're required to use reads in full: "I have verified this is not code or file paths." No shortcut. Every time, the whole thing. The act of typing it is the confirmation. The standard isn't sitting somewhere in a policy document. It's in the work itself, at the exact moment it matters.

From a culture perspective, what stands out is this: Anthropic doesn't rely on people remembering to hold themselves to a higher standard. They build the standard into the daily experience. I work with a framework where culture is the multiplier of everything else in a company. Talent, strategy, resources, all of it gets multiplied by culture. What sets that multiplier is the lived daily experience: the conversation, the label typed on a Tuesday afternoon with no one watching. When I work with founding teams and ask what their real internal standards look like, I stop listening to what they say. I watch what happens when something goes wrong.

05What leadership actually optimises for

The culture that made them fast also made them exposed

This is the second time in a year that internal material ended up visible to the public, through the same mechanism. After the first incident, the fix didn't hold because it addressed the symptom, not the structure that made it possible.

Anthropic optimises heavily for speed and continuous shipping. Everything in how they work is structured for people to move fast and in parallel. That's a deliberate leadership choice, and it produces a product that keeps moving at pace. The side effect is that internal and public work are closely woven together. When you move that fast and that openly, the line between inside and outside gets blurry.

Timeline of the two leaks
May 2025: First leak
Internal code surfaces in public Claude responses
Internal tooling, naming conventions and developer notes end up visible to users through the product. The internal/public boundary wasn't clearly enforced.
Incident 1
Mid 2025: Response
Partial fix
The issue was addressed, but incompletely. The structural reason it happened was not fully resolved. Speed and shipping cadence remained the priority.
Incomplete fix
March 2026: Second leak
Same type of internal material, same mechanism
Not a new vulnerability. The same pattern, repeated. The culture that makes Anthropic fast also kept the internal/external line blurry.
Incident 2
The cultural read
The shadow of a real strength
Every cultural strength has a shadow. Speed and openness are real strengths. They're also how you end up on the front page twice for the same reason.
Leadership insight
Every cultural value has a shadow. Where does your team sit?
Strength
Shadow
Speed + Openness
Product moves fast. Teams ship continuously. Information flows freely. Collaboration is natural.
Anthropic
Blurry internal lines
Internal and public get woven together. The same mistake can happen twice because the structural fix wasn't the priority. The shadow shows up in other places too: when Claude started writing the majority of internal code, CPO Mike Krieger noted the team rapidly hit a new bottleneck in their merge queue. The infrastructure wasn't built for that volume. A different shadow, same source.
Caution + Control
Fewer mistakes. Clear accountability. Strong processes before anything ships.
Slow and over-managed
Good people leave because they can't move. The team waits for permission at every step. Founder stays the bottleneck.
Paul Musters
from practice

I ask founders to show me the last three times something went wrong. The mistake is rarely random. It's usually the shadow of a genuine strength, taken one step too far. Once you see that clearly, you can decide what to do about it rather than being surprised by it, repeatedly. That's the difference between culture happening to you and culture that you actually build.

When the same thing surfaces twice,
the answer lives in the structure.

Slowing down or adding a compliance layer that sits outside the actual work feel like decisive responses. They rarely hold.

What happened twice here is a Direction lever problem. Anthropic's teams know exactly what the direction is: ship, iterate, improve. What's less defined is what "internal" means in practice, at every step of that process. When direction only points one way, people optimise for the thing it points to. The boundary question becomes secondary. And secondary things get forgotten under pressure.

When something happens twice in exactly the same way, the first fix addressed the symptom, not the structure underneath it. The question worth asking isn't "how do we prevent this specific incident." It's: what in how we work made this possible twice?

For Anthropic, the internal/public distinction lives in people's heads, not in the structure of the work. That's the thing to change. Not a policy, not a rule in a doc that people open once. Something built into the work itself that makes the right behaviour easier than the wrong one. They already do this for Claude's behaviour through their Claude's constitution and embedded principles. They need the same thinking applied inward, to how their own work flows.

Two paths from the same incident
Two levers I'd focus on

Direction: make the internal/public boundary explicit and visible in everyday decision-making, not just in post-mortems. It needs to live in the moment a decision gets made, not in the retrospective after something surfaces.

Learning speed: the first leak generated an insight that didn't make it back into the system. That's not an IT problem. It's a learning failure. At culture level 4, you have the architecture to fix this fast. A team this capable shouldn't need the same lesson twice.

The good news is that a team operating at that level already has what it takes. They don't need to rebuild trust or slow the whole machine down. They need to direct their existing culture more precisely. The strength is real. The shadow is manageable. But only once you stop treating it as an IT problem and start treating it as a culture one.

"One of the things I love most about our culture here is that it's very egoless. People just want the right thing to happen."
Benjamin Mann, Co-Founder, Anthropic
Culture scan
Understand what your culture actually optimises for

Using psychometrics, experience and honest intuition, we map the state of your culture. You get a clear picture of which levers are strong and where your shadow is starting to show.

Get in touch Book a conversation
After working with over 1,100 founding teams, here's what I keep coming back to.

The teams that work well together don't talk much about culture. They've just made the right behaviour structurally easier than the wrong behaviour.

This leaked code, of all places, shows exactly what that looks like in practice. The principles aren't technical. They're human. And every founder I know is trying to figure them out.

About the author

Paul Musters
Paul Musters
Paul is the founder of emaho, where he works with founders of growing companies on leadership, team development and company culture. He has analysed over 1,100 founding teams and works with clients including Coinbase (NL), Lightyear and Otrium.
Paul Musters signature

Read the Company Culture Playbook →

Sources

Link copied
Ready to share
emaho culture scan
What does your culture actually optimise for?
Every team has a shadow side to their strengths. A culture scan maps yours before it shows up as a recurring problem.
Book a conversation
emaho