AI Code Quality Framework

Production-quality AI-generated code without losing velocity.

I manage a team of 15+ engineers building a product that processes $100M+ daily volume. We use Claude Code for nearly everything. Six months in, I noticed a pattern: AI coding tools are incredibly fast, but they silently accumulate debt that kills you later - unused imports, orphan exports, copy-pasted logic, tests that can't actually fail, and a codebase that grows 3x faster than it should.

The conventional wisdom is "AI code needs heavy human review." That's wrong. The real problem is that AI tools have no feedback loop. They write code, you accept it, and nobody checks whether it's actually wired up, actually tested, or actually necessary.

This framework fixes that by making quality mechanical and automatic - not aspirational.

The Thesis

Deterministic enforcement + disciplined workflow = production-quality AI-generated code at high velocity.

Instead of hoping Claude writes clean code, you make it impossible for Claude to produce dirty code. Hooks block bad output in real time. CI gates block bad merges. Mutation testing proves your tests actually work. Dead code detection ensures nothing accumulates.

The result: your AI-assisted codebase gets cleaner over time instead of rotting.

That handles half the problem. Hooks, CI, and mutation testing make sure your AI can't ship dirty code. But there's a second failure mode nobody talks about: your AI doesn't actually know your codebase. It can't check if a function already exists before writing a new one. It can't see that renaming a utility breaks 14 callers across 6 modules. It doesn't know which module owns what, or whether the thing it just built duplicates something three directories over. It just writes code and hopes.

Enforcement catches bad code. Intelligence prevents bad decisions. This framework handles enforcement. I use Pharaoh for the intelligence side - it turns your codebase into a knowledge graph your AI queries before touching anything. Different problems, same goal: AI code that actually gets better over time.

What's in This Repo

`FRAMEWORK.md` - The Master Execution Plan

A 1,200-line document containing 6 self-contained phases, each designed to be executed in a single Claude Code session. Work through them sequentially. Each leaves the codebase strictly better than before.

Phase	What It Does	Time
1. Foundation	Install Biome, Knip, Lefthook/Husky, Claude Code hooks	1-2 hrs
2. CI/CD + Strictness	GitHub Actions quality gates, TypeScript strict mode	1-2 hrs
3. Mutation Testing	Stryker integration - prove your tests work	2-3 hrs
4. Template Repository	Reusable project template with full framework	2-3 hrs
5. Cleanup	Remove dead code, audit tests, consolidate duplication	1-2 weeks
6. Workflow Mastery	Daily/weekly/monthly rituals, advanced patterns	Ongoing

Each phase is written as a PRD-Lite - a self-contained specification you can paste directly into Claude Code. It includes exact file scope (what Claude is allowed to touch and what's forbidden), step-by-step instructions, and acceptance criteria.

`template/` - Starter Template

A ready-to-use project template with everything pre-configured. Use GitHub's "Use this template" button or clone it directly.

Includes: Biome config, Knip config, Lefthook pre-commit hooks, Claude Code hooks (.claude/settings.json), Stryker config, Vitest with coverage thresholds, GitHub Actions CI, slash commands (/plan, /plan-review, /wire-check, /health-check, /audit-tests), and a CLAUDE.md with [FILL IN] sections for your project's specifics.

The Toolchain

When	Tool	What It Does
Before writing	Pharaoh	Query codebase graph - blast radius, function search, dead code, dependency tracing via MCP
Every edit	Biome	Lint + format. Fast, opinionated, replaces ESLint + Prettier
Every edit	Claude Code hooks	Typecheck + lint after each file change. Instant feedback loop
Before commit	Lefthook or Husky	Git hooks. Lefthook: fast parallel execution. Husky: widely adopted
Before commit	Knip	Dead code detection - unused exports, files, dependencies
Before commit	Orphan detection	Catches exported functions with no callers
CI	GitHub Actions	Full gate: typecheck + lint + test + knip + orphan check
CI	Stryker	Mutation testing - proves tests actually catch bugs
Periodic audit	jscpd	Copy-paste duplication detection
Periodic audit	madge	Circular dependency detection

Key Concepts

The Quality Ratchet

Every metric moves in one direction. You never lower a threshold.

Metric	Direction	Cadence
Knip issues	→ 0	Weekly
jscpd duplication %	↓	Monthly (-0.5%)
Coverage %	↑	Monthly (+2%)
Mutation score	↑	Monthly (+2%)
Source LOC	↓ or stable	Monthly

The Oracle Gap

Coverage tells you what code ran. Mutation score tells you what code was verified. The gap between them is the oracle gap - tests that exercise code but don't actually assert anything meaningful. This framework closes that gap with Stryker.

Claude Code Hooks

The secret weapon. Three hooks that run automatically:

Post-edit hook - Typechecks and lints after every file edit. Claude gets instant feedback and fixes issues before moving on.
Pre-write hook - Blocks writes to .env, lock files, dist/, and other sensitive files. Claude physically cannot modify them.
Stop hook - Runs typecheck + lint + knip + orphan check when Claude tries to finish. If anything is broken, unused, or unwired, Claude is forced to fix it before completing.

This creates a closed feedback loop that doesn't exist in other AI coding setups.

AI Agents Write Unwired Code

LLM coding agents have a systematic failure mode: they write a function, export it, mark the task "done," but never wire it into the execution path. This isn't a prompting problem - it's structural to how LLMs optimize for task completion. Next session, different context, they build the same thing again. Over a few weeks your codebase is full of functions nobody calls.

The orphan detection script catches this at three gates: Claude Code Stop hook, pre-commit, and CI. Zero escape paths.

But you can also prevent it from the other direction. After implementing something, have your AI verify every new export is actually reachable from a production entry point. Pharaoh's reachability checking does this in one query - traces the call graph from entry points and flags anything disconnected. Detection at three gates plus prevention via graph means nothing slips through.

Builder-Validator Pattern

For critical features, use two Claude Code sessions:

Builder implements the feature
Validator (fresh context) reviews with a security + quality checklist

Fresh context catches things the builder's context has normalized. This is the AI equivalent of code review.

Plan Review

Before implementing any non-trivial change, run /plan-review. It enters plan mode - no code changes, just evaluation. Architecture check, wiring verification, test gap analysis, and structured decision points for every issue found.

Inspired by Garry Tan's planning framework for YC founders, adapted for AI-assisted development and trimmed to what actually matters in a code review. The core idea: force yourself to think before writing. AI makes this worse because writing is so cheap that planning feels like friction. It's not. The 2 minutes you spend in /plan-review saves the 45-minute "oh wait, that already existed" rewrite.

Works standalone with codebase search. Lights up with Pharaoh - blast radius checks, function search, reachability verification all happen automatically during the review.

Who This Is For

Teams using Claude Code or similar AI coding tools for daily development
React / React Native / TypeScript projects (the configs are opinionated for this stack)
Engineers who want to move fast without accumulating hidden debt
Anyone who's noticed their AI-generated codebase growing faster than it should

Getting Started

Option A: Start from scratch with the template

# Use the GitHub template, then:
git clone <your-new-repo>
cd <your-new-repo>
bash scripts/bootstrap.sh
# Fill in CLAUDE.md [FILL IN] sections

Option B: Add to an existing project

Open FRAMEWORK.md
Start at Phase 1
Paste each phase into Claude Code as a task
Work through sequentially - each phase builds on the last

Add Codebase Intelligence

This framework makes your AI write clean code. Pharaoh makes your AI understand your codebase before it starts writing.

What it answers:

"What's the blast radius if I change this file?" - traces callers across modules
"Does a function like this already exist?" - prevents the duplication Knip catches later
"Is this export reachable from any entry point?" - catches dead code before it lands
"What breaks if I rename this?" - dependency tracing across repos

Install via GitHub App: github.com/apps/pharaoh-so/installations/new

If you found this repo useful, use code IMHOTEP for 30% off.

More on AI code quality at pharaoh.so/blog.

FAQ

Does this work with Cursor / Copilot / other AI tools? The framework doc and toolchain work with anything. The Claude Code hooks (.claude/settings.json) and slash commands are Claude Code-specific, but the principles apply universally.

Is this overkill for a small project? Phase 1 (Biome + hooks) takes an hour and pays for itself immediately. You can stop there. Phases 2-6 are for projects that will live longer than a weekend.

Won't the hooks slow Claude down? Typechecking adds ~2-5 seconds per edit. This is a feature, not a bug - it catches errors while Claude still has context to fix them, instead of letting them compound into a broken codebase at the end.

Why mutation testing? Isn't coverage enough? Coverage measures what code ran. A test that calls a function and asserts true === true gives you coverage but catches nothing. Mutation testing modifies your source code and checks if your tests notice. It's the difference between "the test ran" and "the test works."

License

MIT - use it, fork it, adapt it.

Credits

Built by Dan Greer, battle-tested on a team shipping production code daily with Claude Code.

If this saves you time, a star helps others find it.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
pharaoh		pharaoh
skills/plan-review		skills/plan-review
template		template
FRAMEWORK.md		FRAMEWORK.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Code Quality Framework

The Thesis

What's in This Repo

`FRAMEWORK.md` - The Master Execution Plan

`template/` - Starter Template

The Toolchain

Key Concepts

The Quality Ratchet

The Oracle Gap

Claude Code Hooks

AI Agents Write Unwired Code

Builder-Validator Pattern

Plan Review

Who This Is For

Getting Started

Add Codebase Intelligence

FAQ

License

Credits

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

0xUXDesign/ai-code-quality-framework

Folders and files

Latest commit

History

Repository files navigation

AI Code Quality Framework

The Thesis

What's in This Repo

FRAMEWORK.md - The Master Execution Plan

template/ - Starter Template

The Toolchain

Key Concepts

The Quality Ratchet

The Oracle Gap

Claude Code Hooks

AI Agents Write Unwired Code

Builder-Validator Pattern

Plan Review

Who This Is For

Getting Started

Add Codebase Intelligence

FAQ

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`FRAMEWORK.md` - The Master Execution Plan

`template/` - Starter Template

Packages