Amar Gautam
← Back to essays
19 min read

What the Claude Code leak actually reveals about building AI agents

Anthropic accidentally open-sourced 512,000 lines of Claude Code internals. Forget the drama. The code is a masterclass in agentic architecture, and every company building AI agents should be studying it.

On March 31st, Anthropic published Claude Code v2.1.88 to npm with a 59.8MB source map file that should never have been included. Source maps are debugging artifacts that let you reconstruct the original TypeScript from a minified bundle. A missing .npmignore rule, a build step nobody double-checked, and suddenly 512,000 lines of production code across 1,906 files were public. Within hours, a security researcher flagged it on X. Within a day, the mirrored repo had 84,000 GitHub stars. A clean-room rewrite called claw-code hit 50,000 stars in under two hours and crossed 100,000 within a day, making it the fastest-growing repository in GitHub history.

Anthropic called it "a release packaging issue caused by human error." That's almost certainly what it was. Anyone who's shipped npm packages has felt the cold sweat of wondering what actually ended up in the tarball. Their build pipeline generated source maps that weren't excluded from the published package, and the CI pipeline didn't catch it.

But I don't want to write about the incident. I want to write about what the code reveals. Because what Anthropic accidentally open-sourced is essentially a graduate-level textbook on building production AI agents. Dozens of published reports have already summarized the headline discoveries: KAIROS, anti-distillation, undercover mode. Those are interesting, but they're surface-level. The real value lives in the architectural decisions that nobody is talking about. The decisions buried in how tools are described, how permissions cascade, how memory is typed, how the harness steers the model through mid-conversation injections.

I spent the last few days going through the actual code, not the summaries.

It's a harness, not a model. And the harness is 99% of the work.

What surprised people first was what the code actually is. Claude Code isn't a model. It's the "agentic harness" that wraps Claude and gives it the ability to do things: read files, run shell commands, search codebases, manage multi-step workflows, remember context across sessions.

The high-level architecture looks roughly like this:

CLAUDE CODE (Agentic Harness)
==============================================

Layer 1: Input Composition
  - System Prompt Composer
  - Tool Registry (~40 tools)
  - Context Manager
          |
          v
Layer 2: Query Engine (46,000 lines)
  - LLM API calls, streaming, caching, routing
          |
          v
Layer 3: Execution
  - Permission Engine
  - Agentic Loop (call model -> run tools -> repeat)
  - Memory System (MEMORY.md + topic files)
          |
          v
Layer 4: Sub-Agent Spawner
  - Fork model (copy parent context)
  - Teammate model (mailbox communication)
  - Worktree model (isolated git branches)

==============================================
          |                       ^
          v                       |
    Claude API              User Terminal
    (the brain)             (the eyes)

This is probably the most important architectural insight in the entire leak. The model is table stakes. The harness is the product. I'd want every product manager to internalize that.

512,000 lines of TypeScript aren't about making Claude smarter. They're about making it useful. Tool orchestration, permission management, context compression, error recovery, multi-agent coordination. These are the things that determine whether an AI agent actually works in practice or just demos well.

If most of your engineering effort is going into prompt tuning, you're probably building a demo, not a product.

Tool descriptions are behavioral contracts, not documentation

Every published report mentions that Claude Code has ~40 discrete tools. What none of them explain is how those tool descriptions actually function in the architecture.

Look at the Bash tool's description. It doesn't just say "runs a shell command." It contains paragraphs of behavioral constraints embedded directly in the tool definition:

Simplified from the leaked tool definitions:

Tool: "Bash"

Description:
  "Executes a bash command and returns output.

   IMPORTANT: Avoid using this tool to run grep, cat,
   head, tail, sed, awk, or echo commands. Instead,
   use the appropriate dedicated tool:

   - File search: Use Glob (NOT find or ls)
   - Content search: Use Grep (NOT grep or rg)
   - Read files: Use Read (NOT cat/head/tail)
   - Edit files: Use Edit (NOT sed/awk)
   - Write files: Use Write (NOT echo)"

Parameters:
  command:           string    (required)
  description:       string    (model must explain why)
  timeout:           number    (max 600000ms)
  run_in_background: boolean

Notice what's happening. The tool description isn't documentation for the user. It's a steering mechanism for the model. It explicitly tells the model not to use Bash for tasks that other tools handle better. The Grep tool goes even further and explicitly states "ALWAYS use Grep for search tasks. NEVER invoke grep or rg as a Bash command." The Edit tool warns that it will fail if old_string is not unique in the file.

These aren't just helpful hints. They're load-bearing architectural elements. Without them, the model would constantly reach for Bash when it should use Grep, try to edit files with sed, pass ambiguous strings to search functions. The behavioral constraints baked into tool descriptions are doing as much work as the tool implementations themselves.

Every tool follows the same pattern:

TOOL DEFINITION (pattern every tool follows)
---------------------------------------------
name          the tool's identity
description   behavioral contract:
                - what it does
                - when NOT to use it
                - what will cause failures
                - which tools to prefer instead
parameters    input schema (JSON Schema)
permission    required trust level
execution     actual implementation logic

If you're building agents, your tool descriptions are part of your control plane, not your documentation. Invest in them like you'd invest in API contracts. If your tool descriptions are one-sentence summaries, your agent will make bad tool selections constantly, and you'll blame the model for what's actually a harness problem.

Mid-conversation context injection

Other agent frameworks do mid-conversation context injection. LangChain has middleware hooks, OpenAI's Assistants API injects tool results mid-thread. But Claude Code's implementation is unusually systematic, and it reveals a philosophy worth studying.

The harness injects contextual information into the conversation using what the code calls "system-reminder" tags. These are messages inserted into the conversation flow that the model sees but the user doesn't interact with:

How system-reminders appear in the conversation stream:

  [system]     "You are Claude Code..."
  [user]       "Fix the login bug"
  [assistant]  "Let me look at..."

  // Harness injects this transparently ↓

  [injected]   SYSTEM-REMINDER:
                 gitStatus: branch 'fix-auth', 3 modified
                 Available deferred tools: WebSearch, WebFetch
                 Task reminder: 2 tasks in progress

  [assistant]  "Based on the current branch..."

Think about what this enables. The harness doesn't have to front-load everything into the system prompt. Instead, it injects relevant context at the right moment. If the user mentions git, in comes the current branch and status. Tool schemas arrive only when the model first reaches for them. And if the model hasn't checked its task list in a while, a nudge shows up in the stream.

TRADITIONAL APPROACH:
  System prompt contains everything, always:
    - All tool schemas (~40 tools)
    - All current state (git, files)
    - All memory records
    - All behavioral rules
  Token cost: HUGE, every single call

CLAUDE CODE'S APPROACH:
  System prompt contains base rules only:
    - Core behavioral rules
    - Tool names (schemas deferred)

  + system-reminders injected as needed:
    - git status     (on git-related task)
    - tool schemas   (on first use)
    - memory         (on relevant trigger)
    - task state     (periodic nudge)
  Token cost: proportional to relevance

This is more like an event-driven architecture where the harness publishes context updates and the model subscribes to them implicitly through the conversation stream.

The engineering benefit is elegant: instead of allocating a fixed portion of your context window to "current state," you inject state dynamically when it's relevant. And the product benefit explains why Claude Code feels responsive and aware without being expensive. The harness is selectively feeding the model context that makes it seem omniscient, when in reality it's seeing carefully curated snapshots at precisely the right moments.

Deferred tool loading: lazy initialization for the context window

Related to the injection pattern, Claude Code doesn't load all 40+ tool schemas at once. Tools are registered in a deferred state where only their names are visible:

What the model initially sees:

  "The following deferred tools are available via ToolSearch:
   WebFetch, WebSearch, TaskCreate, TaskUpdate,
   NotebookEdit, CronCreate, SendMessage..."

When the model needs WebSearch, it calls:

  ToolSearch(query: "WebSearch", max_results: 1)

Only THEN does the full schema load into context:

  Tool: "WebSearch"
  Description: "Allows Claude to search the web..."
  Parameters:
    query:           string  (required)
    allowed_domains: array
    blocked_domains: array

This is context window budget management at its most disciplined. Each tool schema can be hundreds of tokens (the Bash tool description alone is over 1,000). Forty tool schemas loaded permanently would cost thousands of tokens in every single API call, and most of those tools won't be used in most conversations.

If you're building an agent with more than a handful of tools, you should be doing this. The alternative is like importing every npm package in your project into every file. It works until it doesn't, and the failure mode is silent degradation as your context window fills up with tool definitions the model isn't using.

Memory as a typed system with verification obligations

Published reports all mention MEMORY.md. What they miss is that Claude Code's memory system isn't a notepad. It's a typed, structured system with explicit schema requirements and, crucially, verification obligations. There's also a hard constraint most analyses skip: total memory is capped at 25KB, and the index file is capped at 200 lines. These limits force ruthless prioritization.

The memory system defines four distinct types:

Memory types from the leaked source:

  MemoryType = "user" | "feedback" | "project" | "reference"

  USER — who the person is, preferences, expertise
    Save when: learning about user's role or knowledge
    Use when:  tailoring responses to their level

  FEEDBACK — corrections AND confirmations
    Save when: user corrects approach OR confirms it worked
    Use when:  guiding future behavior
    Key:       record success, not just failure

  PROJECT — ongoing work, decisions, deadlines
    Save when: learning who/what/why/when
    Use when:  understanding context behind requests
    Rule:      convert relative dates to absolute

  REFERENCE — pointers to external systems
    Save when: learning about tools, dashboards, URLs
    Use when:  user references external resources

Each memory file uses a structured frontmatter format:

---
name: feedback_testing
description: Integration tests must hit real DB, not mocks
type: feedback
---

Integration tests must hit a real database, not mocks.

Why: Prior incident where mock/prod divergence masked
a broken migration. Tests passed, prod broke.

How to apply: When writing tests for database-touching
code, always use a test database, never mock the DB layer.

The feedback type is particularly clever. The system explicitly instructs the model to record both failures AND successes: "if you only save corrections, you will avoid past mistakes but drift away from approaches the user has already validated, and may grow overly cautious." Most memory systems capture negative feedback (don't do X) but not positive feedback (yes, keep doing Y). The result is an agent that becomes progressively more timid.

But the part I think is genuinely novel is the verification obligation. The model is explicitly told:

"Memory records can become stale. Before answering the user or building assumptions based solely on information in memory records, verify that the memory is still correct. 'The memory says X exists' is not the same as 'X exists now.'"

The memory system doesn't trust itself. It treats its own records as hypotheses that need to be confirmed against reality before acting. This is a fundamentally different trust model from "if you saved it, it's true," and it's the correct one for any agent operating on a codebase that changes.

MEMORY ARCHITECTURE (3 layers)
---------------------------------------------

Layer 1: MEMORY.md (always loaded in context)
  - user_profile.md     "role, preferences"
  - feedback_testing.md  "use real DB, not mocks"
  - project_launch.md   "deadline 4/15"
  ~150 chars per entry, max 200 lines, 25KB total cap
            |
            | on-demand lookup
            v
Layer 2: Topic files (loaded when referenced)
  Full detail with frontmatter schema
  "Why" + "How to apply" structure
            |
            | verification step
            v
Layer 3: Reality check (grep, git log, read)
  "Does this function still exist?"
  "Is this file still at this path?"
  Memory claims != current truth

Permission design as reliability engineering

Claude Code's permission system deserves its own essay. The code defines six permission modes, and users can configure them per tool:

Permission modes from the leaked source:

  PermissionMode =
    | "default"            Ask for most things
    | "acceptEdits"        Auto-approve file edits only
    | "dontAsk"            Auto-DENY anything not pre-approved
    | "bypassPermissions"  Skip most checks (not .git writes)
    | "plan"               Read-only mode, no edits allowed
    | "auto"               Separate AI classifier evaluates risk

  Per-tool configuration example:
    Read  --> auto-approve
    Bash  --> ask every time
    Edit  --> accept edits mode
    Push  --> require plan approval

A subtle but important detail: dontAsk doesn't mean "approve everything without asking." It means "deny everything that isn't explicitly pre-approved." It's designed for headless and CI environments where you want a fixed, explicit tool surface. Getting this wrong in your own agent (interpreting "don't ask" as "don't restrict") would be a serious security flaw.

Similarly, auto mode doesn't mean the model decides for itself. A separate background AI classifier evaluates each tool call and blocks dangerous ones. The main model never gets to approve its own risky actions.

What makes this interesting isn't the permission modes themselves, though. It's the way the system prompt encodes risk awareness directly into the model's reasoning. An entire section titled "Executing actions with care" asks the model to "consider the reversibility and blast radius of actions":

RISK ASSESSMENT FRAMEWORK (from system prompt)
---------------------------------------------

LOW RISK (proceed freely):
  - Reading files
  - Running tests
  - Searching code
  - Local, reversible edits

HIGH RISK (confirm with user):
  - Deleting files or branches
  - Force-pushing to remote
  - Commenting on PRs/issues
  - Sending messages (Slack, email)
  - Modifying CI/CD pipelines
  - Uploading to third-party services

DOMAIN-SPECIFIC (git safety protocol):
  - NEVER amend after hook failure (destroys prev commit)
  - NEVER skip hooks (--no-verify)
  - NEVER git add -A (may stage .env files)
  - NEVER use -i flag (hangs in non-interactive terminal)
  - NEVER force push to main/master

"Blast radius" in a system prompt for a coding assistant. That's a term from incident response, and its presence tells you that Anthropic thinks about agentic AI through the lens of reliability engineering, not just product design.

The system also implements user-configurable hooks:

Hooks: shell commands that fire on agent events
Configured in settings.json:

  "hooks":
    "after:Edit"       -->  "eslint --fix [edited file]"
    "before:Bash"      -->  "validate-command [command]"
    "after:git-commit" -->  "run-tests"

Hook output is treated as user feedback,
so hooks can effectively override model behavior.

This layered approach (per-tool permissions, named modes, user-configurable hooks, risk-aware behavioral instructions) creates a trust system more sophisticated than anything else I've seen in a production agent. And the design insight underneath all of it is that trust is not binary, and it changes over time. Users don't either trust your agent or not. They trust it to read files but not delete them. They trust it to edit code but not push to production. And that trust boundary shifts as they use the system.

Sub-agents: recursion with a ceiling

Claude Code's Agent tool spawns sub-agents as just another tool call. No special orchestration layer. No separate process model. An agent is a tool that creates more agents.

One important constraint: sub-agents cannot spawn their own sub-agents. This is an explicit architectural limit that prevents infinite nesting. It's a deliberate design choice, not a limitation, and any team building multi-agent systems should think hard about whether they want this boundary.

PARENT AGENT
  Task: "Fix the auth bug and update tests"

  Spawns 3 sub-agents (cannot nest further):

  [Explore Agent]          [General Purpose]      [Plan Agent]
  Role: fast search        Role: code changes     Role: architecture
  Tools: Read, Glob, Grep  Tools: ALL             Tools: Read, Glob, Grep
  Restriction: NO editing  Full capability         Restriction: NO Edit/Write
        |                        |                       |
        v                        v                       v
   foreground               foreground               background
   (blocks parent)          (blocks parent)       (runs independently,
                                                   notifies on completion)

Five agent types exist in the source: general-purpose, Explore (fast codebase search), Plan (architecture design), claude-code-guide (documentation lookups), and statusline-setup (utility configuration). Each type has different tool access. The Explore agent can't edit files. The Plan agent can't write code. The specialization isn't just about prompt differences. It's about capability restriction.

Three execution models appear in the code:

THREE EXECUTION MODELS
---------------------------------------------

Fork:      Parent context --copy--> Child context
           Byte-identical, hits API prompt cache

Teammate:  Agent A <--mailbox--> Agent B
           File-based communication across terminals

Worktree:  Agent A --> own git branch (isolated)
           Agent B --> own git branch (isolated)
           No git conflicts between parallel agents

A mailbox pattern handles coordination for dangerous operations:

MAILBOX PATTERN FOR DANGEROUS OPERATIONS
---------------------------------------------

Worker Agent                 Coordinator Mailbox
    |                              |
    |------- request ------------->|
    |        "I want to delete     |
    |         this file"           |
    |                              | (evaluates risk)
    |                              |
    |<------ approve/reject -------|
    |                              |
    (waits until                   atomic claim prevents
     response arrives)             two workers getting
                                   the same approval

Worker agents can't approve their own high-risk actions. They send requests to a coordinator's mailbox and wait. Worth noting: the coordinator is itself a Claude instance following prompt instructions, so this gatekeeping is prompt-enforced, not code-enforced. It's a softer guarantee than a hard architectural constraint, which is an interesting design choice. The coordinator's system prompt explicitly says "Do not rubber-stamp weak work." An atomic claim mechanism prevents race conditions between workers.

Context compression: three problems disguised as one

Published reports mention Claude Code's context management. But they treat it as a single system when it's actually separate systems solving separate problems. (The actual in-session pipeline has four stages if you count tool-result budgeting and context collapse as distinct steps, but the three-tier framing captures the key insight.)

THREE-TIER CONTEXT MANAGEMENT
=============================================

Tier 1: MicroCompact (in-session noise)
  Problem:  Old tool outputs pile up
  Solution: Trim in place, no API call
  Trigger:  Tool output age exceeds threshold
  Cost:     Zero (local operation)
  Example:  500 lines of grep results from 10 turns ago
            --> "Searched for X, found 12 matches
                 in auth/, payments/, tests/"

Tier 2: AutoCompact (approaching context limit)
  Problem:  Conversation too long to continue
  Solution: Model writes itself cliff notes
  Trigger:  Context approaches window ceiling
  Buffer:   Reserves 13,000 tokens
  Output:   Up to 20,000 token structured summary
  Cost:     One API call

Tier 3: MEMORY.md (cross-session continuity)
  Problem:  "What did we do last week?"
  Solution: Persistent index + topic files
  Always:   ~150 char pointers loaded (the index)
  On demand: Full topic files pulled in
  Cost:     Index tokens per call + occasional lookups

What matters is that these three problems (in-session noise, approaching context limits, cross-session continuity) require fundamentally different solutions. MicroCompact is local and cheap. AutoCompact is expensive but thorough. MEMORY.md is persistent but approximate. Trying to solve all three with one mechanism, which is what most teams attempt with some variant of "summarize when full," means you solve none of them well.

One cautionary data point from the source: a comment in autoCompact.ts documented that 1,279 sessions had experienced 50+ consecutive auto-compaction failures, some hitting 3,272 failures. This was wasting roughly 250,000 API calls per day globally. The fix was adding a retry cap of three. Three lines of code to stop burning a quarter-million API calls daily.

If you're building agents and you don't have monitoring on your context management pipeline, you almost certainly have an equivalent problem burning money that you don't know about.

The frustration regex and the philosophy it represents

A regular expression that detects profanity. The internet laughed. A multi-billion-dollar AI company using regex for sentiment analysis.

But this is actually the most important engineering philosophy in the entire codebase, and it applies far beyond frustration detection.

DECISION FRAMEWORK (from architectural patterns)
---------------------------------------------

Does this task require reasoning?

  YES --> Use the model
            Planning, code generation, analysis,
            natural language understanding

  NO  --> Use deterministic code
            - Regex for pattern matching
            - Boolean logic for permissions
            - Static schemas for tool validation
            - Hardcoded limits for retries
            - Hash comparison for duplicates
            - Type checks for input validation

Claude Code uses AI for the parts that require intelligence, and boring deterministic code for everything else. The frustration regex isn't a compromise. It's the right engineering decision. Running an LLM inference to detect "what the fuck" would cost real money and add latency. A 200-character regex handles it in microseconds for free.

I see teams fall into this trap repeatedly, treating "AI-powered" as a property that should apply to every component. They use model inference for routing decisions that could be a switch statement. They use embeddings for searches that could be substring matches. The result is systems that are slow, expensive, and non-deterministic in places where they could be instant, free, and guaranteed correct.

KAIROS: the autonomous agent that already exists

The most significant unreleased feature is KAIROS, referenced over 150 times in the source. It's an autonomous daemon mode where Claude Code runs in the background continuously, receiving periodic heartbeat prompts.

KAIROS (Autonomous Daemon Mode)
=============================================

HEARTBEAT LOOP:
  Every N seconds: "Anything worth doing right now?"
    NO  --> idle, wait for next heartbeat
    YES --> take action:
              - fix errors
              - respond to messages
              - update files
              - run tasks
              - send push notifications

AUTODREAM (runs during idle periods):
  Phase 1: Orient
    Read MEMORY.md, scan memory directory, skim topic files
  Phase 2: Gather Recent Signal
    Scan daily logs, session transcripts, drifted memories
  Phase 3: Consolidate
    Merge duplicates, remove contradictions,
    convert relative dates to absolute
  Phase 4: Prune
    Rebuild MEMORY.md index, enforce 200-line cap,
    remove pointers to files that no longer exist

TRIGGER GATES (all three must pass):
  - 24 hours since last run
  - At least 5 sessions completed
  - Consolidation lock available

INPUTS:  GitHub webhooks, file watchers, CI events
OUTPUTS: append-only daily logs, push notifications
STATUS:  behind feature flags, compiled but OFF

Claude Code literally dreams between sessions to reorganize its understanding of your project. The autoDream subroutine runs as a forked sub-agent (deliberately isolated from the main agent to prevent the consolidation process from corrupting the current conversation) and works through four phases: orient, gather signal, consolidate, and prune.

That fourth phase is interesting. It doesn't just write new memories. It enforces the 200-line and 25KB budget constraints, which means the agent has to decide what's worth remembering and what can go. Memory consolidation isn't just additive. It's editorial.

This is the clearest signal yet of where agentic AI is heading. Not better autocomplete. Not smarter chat. Persistent, autonomous agents that develop and refine their own understanding of your work over time. KAIROS is compiled code behind feature flags, along with 43 other gated capabilities. Not research prototypes. Production code waiting to be turned on.

What's striking is that the capability is built. The bottleneck is trust infrastructure: the safety and evaluation systems that need to exist before you can let an agent run unsupervised overnight.

Anti-distillation: protecting your moat

From claude.ts in the leaked source:

  ANTI_DISTILLATION_CC = true

  When enabled, sends this in API requests:
    anti_distillation: ['fake_tools']

    Effect: API injects decoy tool definitions into
    the system prompt. Fake tools poison training data
    if a competitor records API traffic for distillation.

  Gated behind GrowthBook feature flag:
    "tengu_anti_distill_fake_tool_injection"
    Active only for first-party CLI sessions

  Second mechanism (from betas.ts):
    API buffers assistant text between tool calls,
    summarizes it, returns summary + cryptographic signature.
    Makes it harder to reconstruct exact prompt/response pairs.

This matters beyond Anthropic. If your competitive advantage lives in orchestration logic, prompt engineering, and tool definitions (and in the harness-centric world I've been describing, that's where most of the value is), a motivated competitor can extract it through normal API usage. Record inputs and outputs, use them as training data, approximate your behavior with a cheaper model.

How you protect orchestration IP without degrading the user experience is an open problem. Anthropic's approach is to inject noise that confuses extraction without affecting normal use. It's clever. Whether it's sufficient is another question entirely.

So what do you do with this?

I've been building agentic systems at DevRev for a while now, and most of what I found in the Claude Code source confirmed intuitions I'd developed the hard way. The harness matters more than the model. Tool descriptions need to be treated like API contracts. Memory is useless if the agent trusts it blindly. None of this is new if you've shipped agents in production.

But seeing it laid out across 512,000 lines of TypeScript, with all the specific implementation choices and the scars from real failures (like that 250,000-daily-API-call compaction bug), makes the lessons concrete in a way that blog posts and conference talks never do. You can't hand-wave past "build a good permission system" when you're staring at the actual permission system Anthropic built, with its six modes and per-tool configuration and risk framework borrowed from incident response.

If I had to distill what I took away into a few principles for teams building their own agents, they'd be these. Treat tool descriptions as control plane, not documentation. Inject context mid-conversation instead of front-loading. Defer tool schema loading. Type your memory and make it verify itself against reality. Build domain-specific guardrails from real incidents, not theoretical ones. Use AI only for the parts that genuinely require reasoning, and code for everything else. Design permissions around blast radius, not binary trust. And monitor your context management pipeline the way you'd monitor your database, because silent failures at that layer will eat your budget alive.

The thing I keep coming back to is how little of the code has anything to do with the model itself. Almost all of it is plumbing. Good plumbing, thoughtful plumbing, but plumbing. Context management, permission checks, tool routing, memory indexing, error recovery. That's where the product lives.

Anthropic didn't mean to publish any of this. But I'm glad they did, because it settles a question a lot of teams are still debating: whether the hard part of building AI agents is the AI part. It isn't. Not even close.

Get notified when I publish

No spam, no nonsense. Just a short email when there's a new essay.