Skip to content
Use These 4 Principles to Improve AI Coding
All Posts All Posts

Use These 4 Principles to Improve AI Coding

Tecker Yu
Tecker Yu
AI Native Cloud Engineer × Part-time Investor

Andrej Karpathy recently shared a sample CLAUDE.md with only four rules.

Someone tested it: coding accuracy reportedly went from 65% to 94%.

The model did not suddenly become smarter. The constraints became better.

Karpathy needs little introduction: former Director of AI at Tesla and a founding member of OpenAI. What he identified was not “how to make AI write better code” in the abstract, but four common failure modes of AI coding assistants, each paired with a behavioral contract.

Those four rules were later expanded into twelve and earned 82,000 stars on GitHub. But the core idea did not change: the goal is not to make AI “smarter”; it is to stop it from falling in the same places again and again.

Why Your AI Keeps Repeating the Same Mistakes

Open your CLAUDE.md, and you may find lines like these:

  • Write high-quality code.
  • Think carefully.
  • Keep it simple.

And then?

The AI nods, says “sure”, and keeps making the same mistakes.

It starts coding before the requirement is clear. It fixes a small bug and casually changes a neighboring area. It assumes the project follows some convention, then writes an implementation that looks reasonable in isolation but does not fit the current codebase at all.

The worst part is not that it makes mistakes. The worst part is that the mistake-making process can be deceptively plausible. It drifts a little, patches a little, drifts again. You watch it work and may even feel that it is getting close. Only at the end do you realize it was not approaching the answer; it was adding more patches inside a polluted context.

A Real Example

I had an old Go project that managed dependencies through GOPATH, while my local machine had a newer Go version that defaults to Go modules.

Every time I asked an agent to run tests, it executed go test ./... directly. After the error appeared, it began correcting itself: first trying go mod init, then GO111MODULE=off, then manually setting GOPATH to some guessed directory.

It took four or five rounds of self-correction to get there.

In reality, one line in CLAUDE.md would have solved it: “This project uses GOPATH mode for dependency management.”

But the agent did not know that information existed, so it had to rediscover it through trial and error every time.

What I wanted to solve was not “how do I make AI smarter?”

It was a simpler question: can there be one place where we write down recurring project-specific problems, so the agent sees them before it starts working?

Memory Systems Do Not Solve Team Collaboration

At first, I used the memory system built into my IDE. I put project conventions, common pitfalls, and personal preferences into it. It helped, but it was not stable enough.

One problem is that memory is too personal. For solo projects, that may be fine. In a team project, it becomes an invisible fork: the same repository, but different developers’ agents work with different memories. One agent knows the project must not introduce a certain library; another does not. One agent knows an old module must not be casually refactored; another does not. The codebase has not forked, but the context already has.

Another problem is uncertain loading. You think the agent remembers something, but you do not know whether it actually saw that memory in the current task. Simple tasks may be fine. In long, complex tasks, critical rules may be pushed out of context or selectively forgotten.

I eventually realized that truly important project rules should not live only in one person’s private memory. They should live in the repository, be version-controlled, and be visible to everyone together with the code. That is what CLAUDE.md means to me: not a longer system prompt, but a repository-level collaboration agreement.

Stop Overusing /init

When I first used Claude Code, I also tried /init. It scans the project and generates a CLAUDE.md, which feels convenient. But I quickly noticed a problem: it loves turning code details into long-term context.

Directory structures, module descriptions, current commands, implementation details: these can help in the short term, but they easily become stale maps. Once the code changes, the description in CLAUDE.md starts to rot. Worse, the agent may trust the stale description instead of reading the code again.

I now prefer to leave code details in the code. AI coding agents are already good at searching on the spot: reading callers, checking exports, inspecting tests, and following helper functions through the local context. The root CLAUDE.md should not take over that job. It only needs to tell the agent what it must know every time, and which boundaries it must not cross.

Later, I tried clearing CLAUDE.md and writing it again from scratch. I kept only two kinds of content: general coding rules for the AI, and project-specific rules. The result was better than expected. The agent still made mistakes, but it wandered less, changed unrelated code less often, and produced code with better style and taste. It was as if it finally understood that this project was not a blank exercise field for improvisation.

Karpathy’s 4 Rules

Karpathy’s four coding rules are short. So short that you may wonder, “that’s it?”

1. Think before coding. State your assumptions. Surface tradeoffs.
   Ask before guessing. Push back when a simpler approach exists.

2. Simplicity first. Minimum code that solves the problem. No
   speculative features. No abstractions for single-use code.

3. Surgical changes. Touch only what is asked. Do not "improve"
   adjacent code, comments, or formatting. Match existing style.

4. Goal-driven execution. Define success criteria. Loop until
   verified. Do not narrate steps; tell me what success looks like.

In practice, they answer four questions.

Rule 1 - Think before coding. Do not start writing immediately.

The most common AI mistake is to stay silent when the requirement is ambiguous and start coding anyway. When the misunderstanding becomes visible, the work has to be torn down and rebuilt.

The contract is: before non-trivial work, state the assumptions. Ask when details are uncertain. If there is a simpler approach, push back and say so.

Rule 2 - Simplicity first. Do not over-engineer.

To look “professional”, AI often adds abstractions, configuration, and helper utilities. It may look impressive, but future agents will keep building on top of that complexity.

The contract is: write the minimum code needed to solve the current problem. Do not implement features the user did not ask for. Do not abstract logic that is used only once.

Rule 3 - Surgical changes. Do not casually refactor nearby code.

In old codebases, ugly code often remains not because nobody sees it, but because nobody can safely move it. It may carry historical compatibility, customer-specific behavior, or patches added after production incidents.

AI has no such historical burden. It sees duplication and wants to remove it. It sees inconsistency and wants to normalize it. In a bug-fix task, that is risk.

The contract is: only change the parts required by the task. Do not “improve” neighboring code, comments, or formatting. Match the existing style.

Rule 4 - Goal-driven execution. Do not monologue.

The most dangerous AI response is often not “I don’t know”, but “I think I know.” A confident wrong answer is more dangerous than an explicit uncertainty.

The contract is: define success criteria. Keep verifying until they are met. Do not narrate steps; tell me what success looks like.

These four rules are not “I hope you behave this way.” They are “in this situation, you must behave this way.”

That difference matters.

Why 4 Rules Can Be Better Than 20

Karpathy’s four rules were later expanded into twelve. But there is a counterintuitive observation: after the rule count exceeded fourteen, reported compliance dropped from 76% to 52%.

Why?

When there are too many rules, each rule becomes lighter. The model averages them away, and eventually none of them remains sharp.

Six rules that fit your project are usually more useful than twenty copied from someone else.

So my current approach is to start with Karpathy’s four rules as a baseline, then add project rules one by one according to real failure modes. Every new rule should map to a specific incident: “last week, the AI fell here.”

When adding a rule, ask yourself: without this rule, in which concrete situation would the AI fail?

If you cannot answer that, do not add it.

From 4 Rules to 12: When to Expand

Karpathy’s four rules address the moment of writing code. But in real agentic workflows, the failure modes go beyond those four.

The later eight additions cover gaps like these:

  • Token budgets are not suggestions: in long debugging sessions, AI may repeat attempts, pollute context, and drift farther from the problem.
  • Checkpoint after important steps: in multi-step tasks, errors compound through later steps.
  • Tests should verify intent, not only behavior: AI may write shallow tests that pass without validating the business intent, creating false confidence.
  • Surface conflicts instead of averaging them: when two coding styles conflict, AI may average them into a third, stranger style.
  • Match codebase conventions even when you disagree: AI may introduce a new paradigm it considers better, leaving two patterns in one codebase.
  • Use models only for judgment calls: letting an LLM handle deterministic logic such as retries, routing, or status codes is wasteful and risky.

These additions do not invalidate the original four. They say: protect the baseline first, then expand only where needed.

My suggestion: use the four rules for two weeks in a new project. Whenever AI fails, ask whether the failure was outside the coverage of those four rules. If yes, add one rule. If not, improve the wording of an existing rule.

Project Rules Should Not Become an Encyclopedia

General rules are only the foundation. What makes an agent work smoothly inside a project is the project’s own rules.

I prioritize writing down things like:

  • the tech stack and dependency manager
  • common commands
  • how to run local validation and tests
  • external dependencies and middleware

The validation loop is especially worth making explicit. For example:

After changes, run pnpm typecheck first.
For API behavior, run pnpm test:api.
For UI changes, start the local dev server and confirm key pages open in a browser.
If any validation command cannot run, stop and explain what was skipped and why.

There is another category of project rules that I do not try to invent upfront. I add them while using the agent. When a task does not go smoothly, I may ask it to review the failure at the end:

Carefully review the mistakes you just made, and summarize the lessons as no more than two concise project development principles.

You can even turn that sentence into a skill and run it after each task. Then a human can decide whether the lesson deserves to be written into CLAUDE.md. Only project rules that grow out of real failures are likely to help next time. Real engineering has no silver bullet.

The Root CLAUDE.md Is More Like Onboarding

In daily work, I treat an AI coding agent like an intern who has just joined the project. It is smart, reads quickly, and is willing to work, but it does not know why the project looks the way it does, where it must not touch, or where the team has been hurt before.

When onboarding a new teammate, you do not dump the company’s history and every module’s implementation details into their lap. You first explain a few things: what the project does, what stack it uses, how to test changes, which directories are high-risk, and where to look when uncertain.

CLAUDE.md is similar. The root file should contain the minimum knowledge needed every time, not details that frequently go stale. Let the agent search code details on the spot. Put optional context in docs/ or Skills, and let the model load it gradually when relevant.

A directly usable structure can be short:

# CLAUDE.md

## Project
This repository is a [project type].
Core goal: [what the system is for].
Main users: [internal operations / customers / developers].

## Behavior Rules
- Think before coding: before non-trivial work, restate the requirement, success criteria, and key assumptions.
- Simplicity first: write the minimum code required for the current problem; avoid speculative abstractions.
- Surgical changes: change only the files required by the task; do not refactor neighboring code without confirmation.
- Read before writing: before adding new code, check existing exports, callers, tests, and local conventions.
- Fail loudly: if a check was skipped or uncertainty remains, state it clearly.

## Project Rules
- Use [package manager] for dependencies and scripts.
- Do not introduce [library / pattern / service] without confirmation.
- Follow existing patterns in [key directory].
- Before changing [module], read [docs/path.md].

## Validation
- After TypeScript changes, run [typecheck command].
- After logic changes, run [test command].
- If any command cannot run, explain what was skipped and why.

## Other
- Architecture details live in docs/architecture.md.
- API contracts live in docs/api.md.
- Historical decisions live in docs/decisions.md.
- Load them only when relevant; do not copy their content into this file.

This template is not for blind copying. It is meant to show the boundary: the root file is responsible for stable facts, behavioral boundaries, project red lines, validation loops, and context routing. The more it looks like an encyclopedia, the more likely it becomes noise.

When to Update It

CLAUDE.md is not something you write once and forget. I prefer to maintain it like tests: when an agent makes a mistake you do not want to see again, ask whether it is a repeatable failure mode. If yes, see whether a short rule can prevent it earlier.

If a piece of information is needed for every task, put it in the root CLAUDE.md. If it is needed only for a certain class of tasks, put it in the corresponding document or skill. If it must be enforced, do not rely only on text; use hooks, tests, or CI.

This is also why I dislike making CLAUDE.md too long. After there are too many rules, every rule becomes lighter. Six project-specific rules are usually more useful than twenty copied ones.

Start Now

Open the root of your project.

Create a CLAUDE.md.

Paste in Karpathy’s four rules, then add the most basic project description and validation rules:

# CLAUDE.md

## General Coding Rules

1. Think before coding. State your assumptions. Surface tradeoffs.
   Ask before guessing. Push back when a simpler approach exists.

2. Simplicity first. Minimum code that solves the problem. No
   speculative features. No abstractions for single-use code.

3. Surgical changes. Touch only what is asked. Do not "improve"
   adjacent code, comments, or formatting. Match existing style.

4. Goal-driven execution. Define success criteria. Loop until
   verified. Do not narrate steps; tell me what success looks like.

## Project-Specific Rules

Tech stack:
Dependency management:
Tests:
lint:

Additional rules:
- XXX
- XXX

Use ln -s CLAUDE.md AGENTS.md to make it compatible with IDEs like Qoder.

Then run it for two weeks.

Every time AI makes a mistake, write it down. Decide whether the four rules failed to cover the case, or whether the agent simply did not follow them.

If the case was not covered, add one rule. If the rule existed but was not followed, improve the wording.

Two weeks later, you will have a CLAUDE.md that truly belongs to your project: not copied from someone else, but grown from your own real failure modes.

Writing CLAUDE.md is not about making AI “remember what I said.”

It will not stop agents from making mistakes. But it can make mistakes less frequent, drift shorter, and rework more controllable.

Karpathy’s four principles are not the destination. They are the starting point.

What really changes is not the AI’s behavior itself, but the fact that you turned invisible context that used to live only in people’s heads into part of the project.

Views