diff --git a/README.md b/README.md
index 3b9cf99..183091e 100644
--- a/README.md
+++ b/README.md
@@ -1,167 +1,130 @@
 # BaseAgent - SDK 3.0
 
-High-performance autonomous agent for [Term Challenge](https://term.challenge). **Does NOT use term_sdk** - fully autonomous with litellm.
+High-performance autonomous agent for [Term Challenge](https://term.challenge). Supports multiple LLM providers with **Chutes API** (Kimi K2.5-TEE) as the default.
 
-## Installation
+## Quick Start
 
 ```bash
-# Via pyproject.toml
-pip install .
-
-# Via requirements.txt
+# 1. Install dependencies
 pip install -r requirements.txt
-```
 
-## Usage
+# 2. Configure Chutes API (default provider)
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
 
-```bash
-python agent.py --instruction "Your task here..."
+# 3. Run the agent
+python3 agent.py --instruction "Your task description here..."
 ```
 
-The agent receives the instruction via `--instruction` and executes the task autonomously.
-
-## Mandatory Architecture
-
-> **IMPORTANT**: Agents MUST follow these rules to work correctly.
+### Alternative: OpenRouter
 
-### 1. Project Structure (MANDATORY)
-
-Agents **MUST** be structured projects, NOT single files:
-
-```
-my-agent/
-├── agent.py              # Entry point with --instruction
-├── src/                  # Modules
-│   ├── core/
-│   │   ├── loop.py       # Main loop
-│   │   └── compaction.py # Context management (MANDATORY)
-│   ├── llm/
-│   │   └── client.py     # LLM client (litellm)
-│   └── tools/
-│       └── ...           # Available tools
-├── requirements.txt      # Dependencies
-└── pyproject.toml        # Project config
+```bash
+export LLM_PROVIDER="openrouter"
+export OPENROUTER_API_KEY="your-openrouter-key"
+python3 agent.py --instruction "Your task description here..."
 ```
 
-### 2. Session Management (MANDATORY)
-
-Agents **MUST** maintain complete conversation history:
-
-```python
-messages = [
-    {"role": "system", "content": system_prompt},
-    {"role": "user", "content": instruction},
-]
+## Documentation
 
-# Add each exchange
-messages.append({"role": "assistant", "content": response})
-messages.append({"role": "tool", "tool_call_id": id, "content": result})
+📚 **Full documentation available in [docs/](docs/)**
+
+### Getting Started
+- [Overview](docs/overview.md) - What is BaseAgent
+- [Installation](docs/installation.md) - Setup instructions
+- [Quick Start](docs/quickstart.md) - First task in 5 minutes
+
+### Core Concepts
+- [Architecture](docs/architecture.md) - Technical deep-dive with diagrams
+- [Configuration](docs/configuration.md) - All settings explained
+- [Usage Guide](docs/usage.md) - CLI commands and examples
+
+### Reference
+- [Tools Reference](docs/tools.md) - Available tools
+- [Context Management](docs/context-management.md) - Token optimization
+- [Best Practices](docs/best-practices.md) - Performance tips
+
+### LLM Providers
+- [Chutes Integration](docs/chutes-integration.md) - **Default provider setup**
+
+## Architecture Overview
+
+```mermaid
+graph TB
+    subgraph User
+        CLI["python3 agent.py --instruction"]
+    end
+    
+    subgraph Core
+        Loop["Agent Loop"]
+        Context["Context Manager"]
+    end
+    
+    subgraph LLM
+        Chutes["Chutes API (Kimi K2.5)"]
+        OpenRouter["OpenRouter (fallback)"]
+    end
+    
+    subgraph Tools
+        Shell["shell_command"]
+        Files["read/write_file"]
+        Search["grep_files"]
+    end
+    
+    CLI --> Loop
+    Loop --> Context
+    Loop -->|default| Chutes
+    Loop -->|fallback| OpenRouter
+    Loop --> Tools
 ```
 
-### 3. Context Compaction (MANDATORY)
-
-Compaction is **CRITICAL** for:
-- Avoiding "context too long" errors
-- Preserving critical information
-- Enabling complex multi-step tasks
-- Improving response coherence
+## Key Features
 
-```python
-# Recommended threshold: 85% of context window
-AUTO_COMPACT_THRESHOLD = 0.85
+| Feature | Description |
+|---------|-------------|
+| **Fully Autonomous** | No user confirmation needed |
+| **LLM-Driven** | All decisions made by the language model |
+| **Chutes API** | Default: Kimi K2.5-TEE (256K context, thinking mode) |
+| **Prompt Caching** | 90%+ cache hit rate |
+| **Context Management** | Intelligent pruning and compaction |
+| **Self-Verification** | Automatic validation before completion |
 
-# 2-step strategy:
-# 1. Pruning: Remove old tool outputs
-# 2. AI Compaction: Summarize conversation if pruning insufficient
-```
-
-## Features
+## Environment Variables
 
-### LLM Client (litellm)
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `CHUTES_API_TOKEN` | Yes* | - | Chutes API token |
+| `LLM_PROVIDER` | No | `chutes` | `chutes` or `openrouter` |
+| `LLM_MODEL` | No | `moonshotai/Kimi-K2.5-TEE` | Model identifier |
+| `LLM_COST_LIMIT` | No | `10.0` | Max cost in USD |
+| `OPENROUTER_API_KEY` | For OpenRouter | - | OpenRouter API key |
 
-```python
-from src.llm.client import LiteLLMClient
+*\*Required for default Chutes provider*
 
-llm = LiteLLMClient(
-    model="openrouter/anthropic/claude-opus-4.5",
-    temperature=0.0,
-    max_tokens=16384,
-)
+## Project Structure
 
-response = llm.chat(messages, tools=tool_specs)
 ```
-
-### Prompt Caching
-
-Caches system and recent messages to reduce costs:
-- Cache hit rate: **90%+** on long conversations
-- Significant API cost reduction
-
-### Self-Verification
-
-Before completing, the agent automatically:
-1. Re-reads the original instruction
-2. Verifies each requirement
-3. Only confirms completion if everything is validated
-
-### Context Management
-
-- **Token-based overflow detection** (not message count)
-- **Tool output pruning** (removes old outputs)
-- **AI compaction** (summarizes if needed)
-- **Middle-out truncation** for large outputs
-
-## Available Tools
-
-| Tool | Description |
-|------|-------------|
-| `shell_command` | Execute shell commands |
-| `read_file` | Read files with pagination |
-| `write_file` | Create/overwrite files |
-| `apply_patch` | Apply patches |
-| `grep_files` | Search with ripgrep |
-| `list_dir` | List directories |
-| `view_image` | Analyze images |
-
-## Configuration
-
-See `src/config/defaults.py`:
-
-```python
-CONFIG = {
-    "model": "openrouter/anthropic/claude-opus-4.5",
-    "max_tokens": 16384,
-    "max_iterations": 200,
-    "auto_compact_threshold": 0.85,
-    "prune_protect": 40_000,
-    "cache_enabled": True,
-}
+baseagent/
+├── agent.py                 # Entry point
+├── src/
+│   ├── core/
+│   │   ├── loop.py          # Main agent loop
+│   │   └── compaction.py    # Context management
+│   ├── llm/
+│   │   └── client.py        # LLM client
+│   ├── config/
+│   │   └── defaults.py      # Configuration
+│   ├── tools/               # Tool implementations
+│   └── prompts/             # System prompt
+├── docs/                    # 📚 Full documentation
+├── rules/                   # Development guidelines
+└── astuces/                 # Implementation techniques
 ```
 
-## Environment Variables
-
-| Variable | Description |
-|----------|-------------|
-| `OPENROUTER_API_KEY` | OpenRouter API key |
-
-## Documentation
-
-### Rules - Development Guidelines
-
-See [rules/](rules/) for comprehensive guides:
-
-- [Architecture Patterns](rules/02-architecture-patterns.md) - **Mandatory project structure**
-- [LLM Usage Guide](rules/06-llm-usage-guide.md) - **Using litellm**
-- [Best Practices](rules/05-best-practices.md)
-- [Error Handling](rules/08-error-handling.md)
-
-### Tips - Practical Techniques
-
-See [astuces/](astuces/) for techniques:
+## Development Guidelines
 
-- [Prompt Caching](astuces/01-prompt-caching.md)
-- [Context Management](astuces/03-context-management.md)
-- [Local Testing](astuces/09-local-testing.md)
+For agent developers, see:
+- [rules/](rules/) - Architecture patterns, best practices, anti-patterns
+- [astuces/](astuces/) - Practical techniques (caching, verification, etc.)
+- [AGENTS.md](AGENTS.md) - Comprehensive building guide
 
 ## License
 
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..3700151
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,125 @@
+# BaseAgent Documentation
+
+> **Professional documentation for the BaseAgent autonomous coding assistant**
+
+BaseAgent is a high-performance autonomous agent designed for the [Term Challenge](https://term.challenge). It leverages LLM-driven decision making with advanced context management and cost optimization techniques.
+
+---
+
+## Table of Contents
+
+### Getting Started
+- [Overview](./overview.md) - What is BaseAgent and core design principles
+- [Installation](./installation.md) - Prerequisites and setup instructions
+- [Quick Start](./quickstart.md) - Your first task in 5 minutes
+
+### Core Concepts
+- [Architecture](./architecture.md) - Technical architecture and system design
+- [Configuration](./configuration.md) - All configuration options explained
+- [Usage Guide](./usage.md) - Command-line interface and options
+
+### Reference
+- [Tools Reference](./tools.md) - Available tools and their parameters
+- [Context Management](./context-management.md) - Token management and compaction
+- [Best Practices](./best-practices.md) - Optimal usage patterns
+
+### LLM Providers
+- [Chutes API Integration](./chutes-integration.md) - Using Chutes as your LLM provider
+
+---
+
+## Quick Navigation
+
+| Document | Description |
+|----------|-------------|
+| [Overview](./overview.md) | High-level introduction and design principles |
+| [Installation](./installation.md) | Step-by-step setup guide |
+| [Quick Start](./quickstart.md) | Get running in minutes |
+| [Architecture](./architecture.md) | Technical deep-dive with diagrams |
+| [Configuration](./configuration.md) | Environment variables and settings |
+| [Usage](./usage.md) | CLI commands and examples |
+| [Tools](./tools.md) | Complete tools reference |
+| [Context Management](./context-management.md) | Memory and token optimization |
+| [Best Practices](./best-practices.md) | Tips for optimal performance |
+| [Chutes Integration](./chutes-integration.md) | Chutes API setup and usage |
+
+---
+
+## Architecture at a Glance
+
+```mermaid
+graph TB
+    subgraph User["User Interface"]
+        CLI["CLI (agent.py)"]
+    end
+    
+    subgraph Core["Core Engine"]
+        Loop["Agent Loop"]
+        Context["Context Manager"]
+        Cache["Prompt Cache"]
+    end
+    
+    subgraph LLM["LLM Layer"]
+        Client["LiteLLM Client"]
+        Provider["Provider (Chutes/OpenRouter)"]
+    end
+    
+    subgraph Tools["Tool System"]
+        Registry["Tool Registry"]
+        Shell["shell_command"]
+        Files["read_file / write_file"]
+        Search["grep_files / list_dir"]
+    end
+    
+    CLI --> Loop
+    Loop --> Context
+    Loop --> Cache
+    Loop --> Client
+    Client --> Provider
+    Loop --> Registry
+    Registry --> Shell
+    Registry --> Files
+    Registry --> Search
+```
+
+---
+
+## Key Features
+
+- **Fully Autonomous** - No user confirmation required; makes decisions independently
+- **LLM-Driven** - All decisions made by the language model, not hardcoded logic
+- **Prompt Caching** - 90%+ cache hit rate for significant cost reduction
+- **Context Management** - Intelligent pruning and compaction for long tasks
+- **Self-Verification** - Automatic validation before task completion
+- **Multi-Provider** - Supports Chutes AI, OpenRouter, and litellm-compatible providers
+
+---
+
+## Project Structure
+
+```
+baseagent/
+├── agent.py                 # Entry point
+├── src/
+│   ├── core/
+│   │   ├── loop.py          # Main agent loop
+│   │   └── compaction.py    # Context management
+│   ├── llm/
+│   │   └── client.py        # LLM client (litellm)
+│   ├── config/
+│   │   └── defaults.py      # Configuration
+│   ├── tools/               # Tool implementations
+│   ├── prompts/
+│   │   └── system.py        # System prompt
+│   └── output/
+│       └── jsonl.py         # JSONL event emission
+├── rules/                   # Development guidelines
+├── astuces/                 # Implementation techniques
+└── docs/                    # This documentation
+```
+
+---
+
+## License
+
+MIT License - See [LICENSE](../LICENSE) for details.
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..772b5ee
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,435 @@
+# Technical Architecture
+
+> **Deep dive into BaseAgent's system design, components, and data flow**
+
+## System Overview
+
+BaseAgent follows a modular architecture with clear separation of concerns:
+
+```mermaid
+graph TB
+    subgraph Entry["Entry Layer"]
+        agent["agent.py<br/>CLI Entry Point"]
+    end
+    
+    subgraph Core["Core Layer"]
+        loop["loop.py<br/>Agent Loop"]
+        compact["compaction.py<br/>Context Manager"]
+    end
+    
+    subgraph LLM["LLM Layer"]
+        client["client.py<br/>LiteLLM Client"]
+    end
+    
+    subgraph Config["Configuration"]
+        defaults["defaults.py<br/>Settings"]
+        prompts["system.py<br/>System Prompt"]
+    end
+    
+    subgraph Tools["Tool Layer"]
+        registry["registry.py<br/>Tool Registry"]
+        shell["shell.py"]
+        read["read_file.py"]
+        write["write_file.py"]
+        patch["apply_patch.py"]
+        grep["grep_files.py"]
+        list["list_dir.py"]
+    end
+    
+    subgraph Output["Output Layer"]
+        jsonl["jsonl.py<br/>Event Emitter"]
+    end
+    
+    agent --> loop
+    loop --> compact
+    loop --> client
+    loop --> registry
+    loop --> jsonl
+    client --> defaults
+    loop --> prompts
+    registry --> shell & read & write & patch & grep & list
+    
+    style loop fill:#4CAF50,color:#fff
+    style client fill:#2196F3,color:#fff
+    style compact fill:#FF9800,color:#fff
+```
+
+---
+
+## Component Diagram
+
+```mermaid
+classDiagram
+    class AgentContext {
+        +instruction: str
+        +cwd: str
+        +step: int
+        +is_done: bool
+        +history: List
+        +shell(cmd, timeout) ShellResult
+        +done()
+        +log(msg)
+    }
+    
+    class LiteLLMClient {
+        +model: str
+        +temperature: float
+        +max_tokens: int
+        +cost_limit: float
+        +chat(messages, tools) LLMResponse
+        +get_stats() Dict
+    }
+    
+    class LLMResponse {
+        +text: str
+        +function_calls: List~FunctionCall~
+        +tokens: Dict
+        +has_function_calls() bool
+    }
+    
+    class FunctionCall {
+        +id: str
+        +name: str
+        +arguments: Dict
+    }
+    
+    class ToolRegistry {
+        +tools: Dict
+        +execute(ctx, name, args) ToolResult
+        +get_tools_for_llm() List
+    }
+    
+    class ToolResult {
+        +success: bool
+        +output: str
+        +inject_content: Optional
+    }
+    
+    AgentContext --> LiteLLMClient : uses
+    LiteLLMClient --> LLMResponse : returns
+    LLMResponse --> FunctionCall : contains
+    AgentContext --> ToolRegistry : uses
+    ToolRegistry --> ToolResult : returns
+```
+
+---
+
+## Agent Loop Workflow
+
+The heart of BaseAgent is the agent loop in `src/core/loop.py`:
+
+```mermaid
+flowchart TB
+    Start([Start]) --> Init[Initialize Session]
+    Init --> BuildMsg[Build Initial Messages]
+    BuildMsg --> GetState[Get Terminal State]
+    
+    GetState --> LoopStart{Iteration < Max?}
+    
+    LoopStart -->|Yes| ManageCtx[Manage Context<br/>Prune/Compact if needed]
+    ManageCtx --> ApplyCache[Apply Prompt Caching]
+    ApplyCache --> CallLLM[Call LLM with Tools]
+    
+    CallLLM --> HasCalls{Has Tool Calls?}
+    
+    HasCalls -->|Yes| ResetPending[Reset pending_completion]
+    ResetPending --> ExecTools[Execute Tool Calls]
+    ExecTools --> AddResults[Add Results to Messages]
+    AddResults --> LoopStart
+    
+    HasCalls -->|No| CheckPending{pending_completion?}
+    
+    CheckPending -->|No| SetPending[Set pending_completion = true]
+    SetPending --> InjectVerify[Inject Verification Prompt]
+    InjectVerify --> LoopStart
+    
+    CheckPending -->|Yes| Complete[Task Complete]
+    
+    LoopStart -->|No| Timeout[Max Iterations Reached]
+    
+    Complete --> Emit[Emit turn.completed]
+    Timeout --> Emit
+    Emit --> End([End])
+    
+    style ManageCtx fill:#FF9800,color:#fff
+    style ApplyCache fill:#9C27B0,color:#fff
+    style CallLLM fill:#2196F3,color:#fff
+    style ExecTools fill:#4CAF50,color:#fff
+    style InjectVerify fill:#E91E63,color:#fff
+```
+
+---
+
+## Data Flow
+
+### Request Flow
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Entry as agent.py
+    participant Loop as loop.py
+    participant Context as compaction.py
+    participant Cache as Prompt Cache
+    participant LLM as LiteLLM Client
+    participant Provider as API Provider
+    participant Tools as Tool Registry
+
+    User->>Entry: --instruction "Create hello.txt"
+    Entry->>Entry: Initialize AgentContext
+    Entry->>Entry: Initialize LiteLLMClient
+    Entry->>Loop: run_agent_loop()
+    
+    Loop->>Loop: Build messages [system, user, state]
+    
+    rect rgb(255, 240, 220)
+        Note over Loop,Provider: Iteration Loop
+        Loop->>Context: manage_context(messages)
+        Context-->>Loop: Managed messages
+        
+        Loop->>Cache: apply_caching(messages)
+        Cache-->>Loop: Cached messages
+        
+        Loop->>LLM: chat(messages, tools)
+        LLM->>Provider: API Request
+        Provider-->>LLM: Response
+        LLM-->>Loop: LLMResponse
+        
+        alt Has tool_calls
+            Loop->>Tools: execute(ctx, tool_name, args)
+            Tools-->>Loop: ToolResult
+            Loop->>Loop: Append to messages
+        end
+    end
+    
+    Loop-->>Entry: Complete
+    Entry-->>User: JSONL output
+```
+
+### Message Structure
+
+Messages accumulate through the session:
+
+```python
+messages = [
+    # 1. System prompt (stable, cached)
+    {"role": "system", "content": SYSTEM_PROMPT},
+    
+    # 2. User instruction
+    {"role": "user", "content": "Create hello.txt with 'Hello World'"},
+    
+    # 3. Initial state
+    {"role": "user", "content": "Current directory:\n```\n...\n```"},
+    
+    # 4. Assistant response with tool calls
+    {
+        "role": "assistant",
+        "content": "Creating the file...",
+        "tool_calls": [
+            {"id": "call_1", "type": "function", "function": {...}}
+        ]
+    },
+    
+    # 5. Tool result
+    {"role": "tool", "tool_call_id": "call_1", "content": "File created"},
+    
+    # ... continues until completion
+]
+```
+
+---
+
+## Module Descriptions
+
+### `src/core/loop.py` - Agent Loop
+
+The main orchestration module that:
+- Initializes the session and emits JSONL events
+- Manages the iterative Observe→Think→Act cycle
+- Applies prompt caching for cost optimization
+- Handles LLM errors with retry logic
+- Triggers self-verification before completion
+
+### `src/core/compaction.py` - Context Manager
+
+Intelligent context management that:
+- Estimates token usage (4 chars ≈ 1 token)
+- Detects context overflow at 85% of usable window
+- Prunes old tool outputs (protects last 40K tokens)
+- Runs AI compaction when pruning is insufficient
+- Preserves critical information through summarization
+
+### `src/llm/client.py` - LLM Client
+
+LiteLLM-based client that:
+- Supports multiple providers (Chutes, OpenRouter, etc.)
+- Tracks token usage and costs
+- Handles tool/function calling format
+- Enforces cost limits
+- Provides usage statistics
+
+### `src/tools/registry.py` - Tool Registry
+
+Centralized tool management that:
+- Registers all available tools
+- Provides tool specs for LLM
+- Executes tools with proper context
+- Handles tool output truncation
+- Manages image injection for `view_image`
+
+### `src/prompts/system.py` - System Prompt
+
+System prompt configuration that:
+- Defines agent personality and behavior
+- Specifies coding guidelines
+- Includes AGENTS.md support
+- Configures autonomous operation mode
+- Provides environment context
+
+### `src/config/defaults.py` - Configuration
+
+Central configuration containing:
+- Model settings (model name, tokens, temperature)
+- Context management thresholds
+- Tool output limits
+- Prompt caching settings
+- Execution limits
+
+---
+
+## Context Management Pipeline
+
+```mermaid
+flowchart LR
+    subgraph Input
+        Msgs[Messages<br/>~150K tokens]
+    end
+    
+    subgraph Detection
+        Est[Estimate Tokens]
+        Check{> 85% of<br/>168K usable?}
+    end
+    
+    subgraph Pruning
+        Scan[Scan backwards]
+        Protect[Protect last 40K<br/>tool tokens]
+        Clear[Clear old outputs]
+    end
+    
+    subgraph Compaction
+        CheckAgain{Still > 85%?}
+        Summarize[AI Summarization]
+        NewMsgs[Compacted Messages]
+    end
+    
+    subgraph Output
+        Result[Managed Messages]
+    end
+    
+    Msgs --> Est --> Check
+    Check -->|No| Result
+    Check -->|Yes| Scan --> Protect --> Clear
+    Clear --> CheckAgain
+    CheckAgain -->|No| Result
+    CheckAgain -->|Yes| Summarize --> NewMsgs --> Result
+```
+
+---
+
+## Tool Execution Flow
+
+```mermaid
+flowchart TB
+    subgraph LLM["LLM Response"]
+        Calls["tool_calls: [<br/>  {name: 'shell_command', args: {command: 'ls'}},<br/>  {name: 'read_file', args: {file_path: 'README.md'}}<br/>]"]
+    end
+    
+    subgraph Registry["Tool Registry"]
+        direction TB
+        Lookup[Lookup Tool]
+        Execute[Execute with Context]
+        Truncate[Truncate Output<br/>max 2500 tokens]
+    end
+    
+    subgraph Tools["Tool Implementations"]
+        Shell[shell_command]
+        Read[read_file]
+        Write[write_file]
+        Patch[apply_patch]
+        Grep[grep_files]
+        List[list_dir]
+    end
+    
+    subgraph Output["Results"]
+        Results["tool results added<br/>to messages"]
+    end
+    
+    Calls --> Lookup
+    Lookup --> Execute
+    Execute --> Shell & Read & Write & Patch & Grep & List
+    Shell & Read & Write & Patch & Grep & List --> Truncate
+    Truncate --> Results
+```
+
+---
+
+## JSONL Event Emission
+
+BaseAgent emits structured JSONL events throughout execution:
+
+```mermaid
+sequenceDiagram
+    participant Loop as Agent Loop
+    participant JSONL as Event Emitter
+    participant stdout as Standard Output
+
+    Loop->>JSONL: emit(ThreadStartedEvent)
+    JSONL->>stdout: {"type": "thread.started", ...}
+    
+    Loop->>JSONL: emit(TurnStartedEvent)
+    JSONL->>stdout: {"type": "turn.started", ...}
+    
+    loop Each Tool Call
+        Loop->>JSONL: emit(ItemStartedEvent)
+        JSONL->>stdout: {"type": "item.started", ...}
+        Loop->>JSONL: emit(ItemCompletedEvent)
+        JSONL->>stdout: {"type": "item.completed", ...}
+    end
+    
+    Loop->>JSONL: emit(TurnCompletedEvent)
+    JSONL->>stdout: {"type": "turn.completed", "usage": {...}}
+```
+
+---
+
+## Error Handling Strategy
+
+```mermaid
+flowchart TB
+    Error[Error Occurs] --> Type{Error Type?}
+    
+    Type -->|CostLimitExceeded| Abort[Emit TurnFailed<br/>Abort Session]
+    
+    Type -->|Authentication| Abort
+    
+    Type -->|Rate Limit| Retry{Attempt < 5?}
+    Retry -->|Yes| Wait[Wait 10s × attempt]
+    Wait --> TryAgain[Retry Request]
+    Retry -->|No| Abort
+    
+    Type -->|Timeout/504| Retry
+    
+    Type -->|Other| Retry
+    
+    TryAgain --> Success{Success?}
+    Success -->|Yes| Continue[Continue Loop]
+    Success -->|No| Retry
+```
+
+---
+
+## Next Steps
+
+- [Configuration Reference](./configuration.md) - All settings explained
+- [Tools Reference](./tools.md) - Detailed tool documentation
+- [Context Management](./context-management.md) - Deep dive into memory management
diff --git a/docs/best-practices.md b/docs/best-practices.md
new file mode 100644
index 0000000..7fa098a
--- /dev/null
+++ b/docs/best-practices.md
@@ -0,0 +1,408 @@
+# Best Practices
+
+> **Strategies for optimal performance, cost efficiency, and reliable results**
+
+## Core Principles
+
+BaseAgent follows these fundamental principles:
+
+1. **Explore First** - Always gather context before acting
+2. **Iterate** - Never try to solve everything in one shot
+3. **Verify** - Double-confirm before completing
+4. **Fail Gracefully** - Handle errors and retry
+5. **Stay Focused** - Complete exactly what's asked
+
+---
+
+## Explore-First Pattern
+
+Before making any changes, always understand the context:
+
+```mermaid
+flowchart LR
+    subgraph Bad["❌ Bad Pattern"]
+        B1[Receive Task] --> B2[Start Coding]
+        B2 --> B3[Hit Problems]
+        B3 --> B4[Backtrack]
+    end
+    
+    subgraph Good["✅ Good Pattern"]
+        G1[Receive Task] --> G2[Explore Codebase]
+        G2 --> G3[Understand Patterns]
+        G3 --> G4[Plan Approach]
+        G4 --> G5[Implement]
+    end
+```
+
+### Exploration Steps
+
+1. **Read README** - Understand project purpose
+2. **List directory** - See project structure
+3. **Find similar code** - Match existing patterns
+4. **Check tests** - Understand expected behavior
+5. **Review AGENTS.md** - Follow project instructions
+
+---
+
+## Self-Verification
+
+BaseAgent automatically verifies work before completion:
+
+```mermaid
+sequenceDiagram
+    participant Agent
+    participant Verify as Verification
+    participant LLM as LLM
+
+    Agent->>Agent: No more tool calls
+    Agent->>Verify: Inject verification prompt
+    Verify->>LLM: Re-read instruction
+    LLM->>LLM: List requirements
+    LLM->>LLM: Verify each requirement
+    
+    alt All verified
+        LLM-->>Agent: Confirm completion
+    else Something missing
+        LLM-->>Agent: Continue working
+    end
+```
+
+### Verification Checklist
+
+The agent automatically asks:
+- ✅ Did I read the ENTIRE original instruction?
+- ✅ Did I list ALL requirements (explicit and implicit)?
+- ✅ Did I run commands to VERIFY each requirement?
+- ✅ Did I fix any issues found during verification?
+
+---
+
+## Prompt Caching
+
+Achieve **90%+ cache hit rate** for massive cost savings:
+
+```mermaid
+graph TB
+    subgraph Strategy["Caching Strategy"]
+        S1["Cache first 2 system messages"]
+        S2["Cache last 2 non-system messages"]
+        S3["Up to 4 breakpoints total"]
+    end
+    
+    subgraph Effect["Effect"]
+        E1["Request 1: Cache miss (create)"]
+        E2["Request 2: Cache HIT (90% saved)"]
+        E3["Request 3: Cache HIT (90% saved)"]
+        E4["Request N: Cache HIT (90% saved)"]
+    end
+    
+    S1 --> E1
+    S2 --> E1
+    E1 --> E2 --> E3 --> E4
+    
+    style E2 fill:#4CAF50,color:#fff
+    style E3 fill:#4CAF50,color:#fff
+    style E4 fill:#4CAF50,color:#fff
+```
+
+### How It Works
+
+```python
+# Messages structure
+messages = [
+    {"role": "system", "content": "...", "cache_control": {"type": "ephemeral"}},  # ✓ Cached
+    {"role": "user", "content": "original instruction"},
+    {"role": "assistant", "content": "...", "tool_calls": [...]},
+    {"role": "tool", "content": "..."},
+    {"role": "assistant", "content": "...", "cache_control": {"type": "ephemeral"}},  # ✓ Cached
+    {"role": "user", "content": "verification", "cache_control": {"type": "ephemeral"}},  # ✓ Cached
+]
+```
+
+### Cost Impact
+
+| Scenario | Cost per 1M tokens |
+|----------|-------------------|
+| No caching | $3.00 |
+| 90% cache hit | $0.30 |
+| **Savings** | **90%** |
+
+---
+
+## Cost Optimization
+
+### Set Cost Limits
+
+```bash
+export LLM_COST_LIMIT="5.0"  # Max $5 per session
+```
+
+### Monitor Usage
+
+Watch the logs for token counts:
+```
+[14:30:17] [loop] Tokens: 50000 input, 45000 cached, 500 output
+```
+
+### Optimize Instructions
+
+```bash
+# ❌ Vague (causes exploration loops)
+python3 agent.py --instruction "Fix the bugs"
+
+# ✅ Specific (direct action)
+python3 agent.py --instruction "Fix the TypeError in src/api/handlers.py:42"
+```
+
+### Use Targeted Tools
+
+```bash
+# ❌ Wasteful
+ls -laR /  # Lists entire filesystem
+
+# ✅ Efficient
+list_dir(dir_path="src/", depth=2)
+```
+
+---
+
+## Git Hygiene
+
+BaseAgent follows strict git rules:
+
+### ✅ Allowed
+
+- `git status` - Check current state
+- `git log` - View history
+- `git blame` - Understand code origins
+- `git diff` - Review changes
+- `git add` - Stage changes (when asked)
+- `git commit` - Commit changes (when asked)
+
+### ❌ Forbidden
+
+- `git reset --hard` - Destructive
+- `git checkout --` - Loses changes
+- Reverting changes you didn't make
+- Amending commits without permission
+- Pushing without explicit request
+
+### Safe Practices
+
+```bash
+# Always check state first
+git status
+
+# Review before committing
+git diff
+
+# Stage specific files
+git add src/specific_file.py
+
+# Never force operations
+# ❌ git push --force
+```
+
+---
+
+## Writing Effective Instructions
+
+### Be Specific
+
+```bash
+# ❌ Too vague
+"Fix the code"
+
+# ✅ Specific
+"Fix the NullPointerException in UserService.java:85 when user.email is null"
+```
+
+### Provide Context
+
+```bash
+# ❌ Missing context
+"Add authentication"
+
+# ✅ With context
+"Add JWT authentication to the /api/users endpoint using the existing AuthService"
+```
+
+### Request Verification
+
+```bash
+# ✅ Ask for verification
+"Create a sorting algorithm and verify it works with [5, 2, 8, 1, 9]"
+```
+
+### Break Down Complex Tasks
+
+```bash
+# ❌ Too complex for one instruction
+"Build a complete e-commerce platform"
+
+# ✅ Incremental
+"Create the product catalog data model with name, price, and description fields"
+```
+
+---
+
+## Tool Usage Patterns
+
+### Shell Commands
+
+```python
+# ✅ Use workdir
+{"command": "ls -la", "workdir": "/workspace/src"}
+
+# ❌ Avoid cd chains
+{"command": "cd /workspace && cd src && ls"}
+```
+
+### File Reading
+
+```python
+# ✅ Read specific sections
+{"file_path": "large.py", "offset": 100, "limit": 50}
+
+# ❌ Read entire large files
+{"file_path": "large.py"}  # May overwhelm context
+```
+
+### Searching
+
+```python
+# ✅ Use grep_files for discovery
+{"pattern": "def calculate", "include": "*.py", "path": "src/"}
+
+# Then read specific files found
+{"file_path": "src/billing/calculator.py"}
+```
+
+### Editing
+
+```python
+# ✅ Use apply_patch for surgical edits
+{"patch": "*** Update File: src/utils.py\n@@ def old_func:\n-    old\n+    new"}
+
+# ✅ Use write_file for new files
+{"file_path": "new_module.py", "content": "..."}
+```
+
+---
+
+## Handling Long Tasks
+
+For complex, multi-step tasks:
+
+### 1. Use update_plan
+
+```python
+{
+    "steps": [
+        {"description": "Analyze existing code", "status": "completed"},
+        {"description": "Design new module", "status": "in_progress"},
+        {"description": "Implement core logic", "status": "pending"},
+        {"description": "Add unit tests", "status": "pending"},
+        {"description": "Update documentation", "status": "pending"}
+    ]
+}
+```
+
+### 2. Monitor Context
+
+Watch for compaction events:
+```
+[compaction] Context overflow detected, managing...
+```
+
+### 3. Save Progress
+
+If context compaction occurs, the summary preserves:
+- Current progress
+- Key decisions
+- Remaining work
+- Modified files
+
+---
+
+## Error Handling
+
+BaseAgent handles errors gracefully:
+
+### Automatic Retry
+
+```mermaid
+flowchart TB
+    Error[Error Occurs] --> Type{Error Type}
+    
+    Type -->|Rate Limit| Wait[Wait + Retry]
+    Type -->|Timeout| Wait
+    Type -->|Server Error| Wait
+    
+    Type -->|Auth Error| Fail[Abort]
+    Type -->|Cost Limit| Fail
+    
+    Wait --> Attempt{Attempt < 5?}
+    Attempt -->|Yes| Retry[Retry Request]
+    Attempt -->|No| Fail
+    
+    Retry --> Success{Success?}
+    Success -->|Yes| Continue[Continue]
+    Success -->|No| Attempt
+```
+
+### Recovery Strategies
+
+1. **Try alternatives** - If one approach fails, try another
+2. **Check documentation** - Read AGENTS.md, README.md
+3. **Simplify** - Break complex operations into steps
+4. **Report issues** - Note blockers in final message
+
+---
+
+## Performance Tips
+
+### Reduce Iterations
+
+1. Give specific, complete instructions
+2. Provide necessary context upfront
+3. Avoid vague requirements
+
+### Minimize Token Usage
+
+1. Search before reading entire files
+2. Use targeted directory listings
+3. Keep tool outputs focused
+
+### Maximize Cache Hits
+
+1. Keep system prompt stable
+2. Don't modify early messages
+3. Let the agent handle caching automatically
+
+---
+
+## Checklist
+
+Before running the agent:
+
+- [ ] Clear, specific instruction
+- [ ] Necessary context provided
+- [ ] API key configured
+- [ ] Cost limit set appropriately
+- [ ] Working directory correct
+
+After completion:
+
+- [ ] Verify output matches requirements
+- [ ] Check for any error messages
+- [ ] Review modified files
+- [ ] Run relevant tests
+
+---
+
+## Next Steps
+
+- [Configuration](./configuration.md) - Tune settings
+- [Context Management](./context-management.md) - Memory optimization
+- [Tools Reference](./tools.md) - Detailed tool docs
diff --git a/docs/chutes-integration.md b/docs/chutes-integration.md
new file mode 100644
index 0000000..75b4955
--- /dev/null
+++ b/docs/chutes-integration.md
@@ -0,0 +1,378 @@
+# Chutes API Integration
+
+> **Using Chutes AI as your LLM provider for BaseAgent**
+
+## Overview
+
+[Chutes AI](https://chutes.ai) provides access to advanced language models through a simple API. BaseAgent supports Chutes as a first-class provider, offering access to the **Kimi K2.5-TEE** model with its powerful thinking capabilities.
+
+---
+
+## Chutes API Features
+
+| Feature | Value |
+|---------|-------|
+| **API Base URL** | `https://llm.chutes.ai/v1` |
+| **Default Model** | `moonshotai/Kimi-K2.5-TEE` |
+| **Model Parameters** | 1T total, 32B activated |
+| **Context Window** | 256K tokens |
+| **Thinking Mode** | Enabled by default |
+
+---
+
+## Quick Setup
+
+### Step 1: Get Your API Token
+
+1. Visit [chutes.ai](https://chutes.ai)
+2. Create an account or sign in
+3. Navigate to API settings
+4. Generate an API token
+
+### Step 2: Configure Environment
+
+```bash
+# Required: API token
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
+
+# Optional: Explicitly set provider and model
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+### Step 3: Run BaseAgent
+
+```bash
+python3 agent.py --instruction "Your task description"
+```
+
+---
+
+## Authentication Flow
+
+```mermaid
+sequenceDiagram
+    participant Agent as BaseAgent
+    participant Client as LiteLLM Client
+    participant Chutes as Chutes API
+
+    Agent->>Client: Initialize with CHUTES_API_TOKEN
+    Client->>Client: Configure litellm
+    
+    loop Each Request
+        Agent->>Client: chat(messages, tools)
+        Client->>Chutes: POST /v1/chat/completions
+        Note over Client,Chutes: Authorization: Bearer $CHUTES_API_TOKEN
+        Chutes-->>Client: Response with tokens
+        Client-->>Agent: LLMResponse
+    end
+```
+
+---
+
+## Model Details: Kimi K2.5-TEE
+
+The **moonshotai/Kimi-K2.5-TEE** model offers:
+
+### Architecture
+- **Total Parameters**: 1 Trillion (1T)
+- **Activated Parameters**: 32 Billion (32B)
+- **Architecture**: Mixture of Experts (MoE)
+- **Context Length**: 256,000 tokens
+
+### Thinking Mode
+
+Kimi K2.5-TEE supports a "thinking mode" where the model shows its reasoning process:
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Model as Kimi K2.5-TEE
+    participant Response
+
+    User->>Model: Complex task instruction
+    
+    rect rgb(230, 240, 255)
+        Note over Model: Thinking Mode Active
+        Model->>Model: Analyze problem
+        Model->>Model: Consider approaches
+        Model->>Model: Evaluate options
+    end
+    
+    Model->>Response: <think>Reasoning process...</think>
+    Model->>Response: Final answer/action
+```
+
+### Temperature Settings
+
+| Mode | Temperature | Top-p | Description |
+|------|-------------|-------|-------------|
+| **Thinking** | 1.0 | 0.95 | More exploratory reasoning |
+| **Instant** | 0.6 | 0.95 | Faster, more deterministic |
+
+---
+
+## Configuration Options
+
+### Basic Configuration
+
+```python
+# src/config/defaults.py
+CONFIG = {
+    "model": os.environ.get("LLM_MODEL", "moonshotai/Kimi-K2.5-TEE"),
+    "provider": "chutes",
+    "temperature": 1.0,  # For thinking mode
+    "max_tokens": 16384,
+}
+```
+
+### Environment Variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `CHUTES_API_TOKEN` | Yes | - | API token from chutes.ai |
+| `LLM_PROVIDER` | No | `openrouter` | Set to `chutes` |
+| `LLM_MODEL` | No | `moonshotai/Kimi-K2.5-TEE` | Model identifier |
+| `LLM_COST_LIMIT` | No | `10.0` | Max cost in USD |
+
+---
+
+## Thinking Mode Processing
+
+When thinking mode is enabled, responses include `<think>` tags:
+
+```xml
+<think>
+The user wants to create a file with specific content.
+I should:
+1. Check if the file already exists
+2. Create the file with the requested content
+3. Verify the file was created correctly
+</think>
+
+I'll create the file for you now.
+```
+
+BaseAgent can be configured to:
+- **Parse and strip** the thinking tags (show only final answer)
+- **Keep** the thinking content (useful for debugging)
+- **Log** thinking to stderr while showing final answer
+
+### Parsing Example
+
+```python
+import re
+
+def parse_thinking(response_text: str) -> tuple[str, str]:
+    """Extract thinking and final response."""
+    think_pattern = r'<think>(.*?)</think>'
+    match = re.search(think_pattern, response_text, re.DOTALL)
+    
+    if match:
+        thinking = match.group(1).strip()
+        final = re.sub(think_pattern, '', response_text, flags=re.DOTALL).strip()
+        return thinking, final
+    
+    return "", response_text
+```
+
+---
+
+## API Request Format
+
+Chutes API follows OpenAI-compatible format:
+
+```bash
+curl -X POST https://llm.chutes.ai/v1/chat/completions \
+  -H "Authorization: Bearer $CHUTES_API_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "moonshotai/Kimi-K2.5-TEE",
+    "messages": [
+      {"role": "system", "content": "You are a helpful assistant."},
+      {"role": "user", "content": "Hello!"}
+    ],
+    "max_tokens": 1024,
+    "temperature": 1.0,
+    "top_p": 0.95
+  }'
+```
+
+---
+
+## Fallback to OpenRouter
+
+If Chutes is unavailable, BaseAgent can fall back to OpenRouter:
+
+```mermaid
+flowchart TB
+    Start[API Request] --> Check{Chutes Available?}
+    
+    Check -->|Yes| Chutes[Send to Chutes API]
+    Chutes --> Success{Success?}
+    Success -->|Yes| Done[Return Response]
+    Success -->|No| Retry{Retry Count < 3?}
+    
+    Retry -->|Yes| Chutes
+    Retry -->|No| Fallback[Use OpenRouter]
+    
+    Check -->|No| Fallback
+    Fallback --> Done
+```
+
+### Configuration for Fallback
+
+```bash
+# Primary: Chutes
+export CHUTES_API_TOKEN="..."
+export LLM_PROVIDER="chutes"
+
+# Fallback: OpenRouter
+export OPENROUTER_API_KEY="..."
+```
+
+### Switching Providers
+
+```bash
+# Switch to OpenRouter
+export LLM_PROVIDER="openrouter"
+export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+
+# Switch back to Chutes
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+---
+
+## Cost Considerations
+
+### Pricing (Approximate)
+
+| Metric | Cost |
+|--------|------|
+| Input tokens | Varies by model |
+| Output tokens | Varies by model |
+| Cached input | Reduced rate |
+
+### Cost Management
+
+```bash
+# Set cost limit
+export LLM_COST_LIMIT="5.0"  # Max $5.00 per session
+```
+
+BaseAgent tracks costs and will abort if the limit is exceeded:
+
+```python
+# In src/llm/client.py
+if self._total_cost >= self.cost_limit:
+    raise CostLimitExceeded(
+        f"Cost limit exceeded: ${self._total_cost:.4f}",
+        used=self._total_cost,
+        limit=self.cost_limit,
+    )
+```
+
+---
+
+## Troubleshooting
+
+### Authentication Errors
+
+```
+LLMError: authentication_error
+```
+
+**Solution**: Verify your token is correct and exported:
+
+```bash
+echo $CHUTES_API_TOKEN  # Should show your token
+export CHUTES_API_TOKEN="correct-token"
+```
+
+### Rate Limiting
+
+```
+LLMError: rate_limit
+```
+
+**Solution**: BaseAgent automatically retries with exponential backoff. You can also:
+- Wait a few minutes before retrying
+- Reduce request frequency
+- Check your API plan limits
+
+### Model Not Found
+
+```
+LLMError: Model 'xyz' not found
+```
+
+**Solution**: Use the correct model identifier:
+
+```bash
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+### Connection Timeouts
+
+```
+LLMError: timeout
+```
+
+**Solution**: BaseAgent retries automatically. If persistent:
+- Check your internet connection
+- Verify Chutes API status
+- Consider using OpenRouter as fallback
+
+---
+
+## Integration with LiteLLM
+
+BaseAgent uses [LiteLLM](https://docs.litellm.ai/) for provider abstraction:
+
+```python
+# src/llm/client.py
+import litellm
+
+# For Chutes, configure base URL
+litellm.api_base = "https://llm.chutes.ai/v1"
+
+# Make request
+response = litellm.completion(
+    model="moonshotai/Kimi-K2.5-TEE",
+    messages=messages,
+    api_key=os.environ.get("CHUTES_API_TOKEN"),
+)
+```
+
+---
+
+## Best Practices
+
+### For Optimal Performance
+
+1. **Enable thinking mode** for complex reasoning tasks
+2. **Use appropriate temperature** (1.0 for exploration, 0.6 for precision)
+3. **Leverage the 256K context** for large codebases
+4. **Monitor costs** with `LLM_COST_LIMIT`
+
+### For Reliability
+
+1. **Set up fallback** to OpenRouter
+2. **Handle rate limits** gracefully (automatic in BaseAgent)
+3. **Log responses** for debugging complex tasks
+
+### For Cost Efficiency
+
+1. **Enable prompt caching** (reduces costs by 90%)
+2. **Use context management** to avoid token waste
+3. **Set reasonable cost limits** for testing
+
+---
+
+## Next Steps
+
+- [Configuration Reference](./configuration.md) - All settings explained
+- [Best Practices](./best-practices.md) - Optimization tips
+- [Usage Guide](./usage.md) - Command-line options
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 0000000..492f074
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,304 @@
+# Configuration Reference
+
+> **Complete guide to all configuration options in BaseAgent**
+
+## Overview
+
+BaseAgent configuration is centralized in `src/config/defaults.py`. Settings can be customized via environment variables or by modifying the configuration file directly.
+
+---
+
+## Configuration File
+
+The main configuration is stored in the `CONFIG` dictionary:
+
+```python
+# src/config/defaults.py
+CONFIG = {
+    # Model Settings
+    "model": "openrouter/anthropic/claude-sonnet-4-20250514",
+    "provider": "openrouter",
+    "temperature": 0.0,
+    "max_tokens": 16384,
+    "reasoning_effort": "none",
+    
+    # Agent Execution
+    "max_iterations": 200,
+    "max_output_tokens": 2500,
+    "shell_timeout": 60,
+    
+    # Context Management
+    "model_context_limit": 200_000,
+    "output_token_max": 32_000,
+    "auto_compact_threshold": 0.85,
+    "prune_protect": 40_000,
+    "prune_minimum": 20_000,
+    
+    # Prompt Caching
+    "cache_enabled": True,
+    
+    # Execution Flags
+    "bypass_approvals": True,
+    "bypass_sandbox": True,
+    "skip_git_check": True,
+    "unified_exec": True,
+    "json_output": True,
+    
+    # Completion
+    "require_completion_confirmation": False,
+}
+```
+
+---
+
+## Environment Variables
+
+### LLM Provider Settings
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_MODEL` | `openrouter/anthropic/claude-sonnet-4-20250514` | Model identifier |
+| `LLM_PROVIDER` | `openrouter` | Provider name (`chutes`, `openrouter`, etc.) |
+| `LLM_COST_LIMIT` | `10.0` | Maximum cost in USD before aborting |
+
+### API Keys
+
+| Variable | Provider | Description |
+|----------|----------|-------------|
+| `CHUTES_API_TOKEN` | Chutes AI | Token from chutes.ai |
+| `OPENROUTER_API_KEY` | OpenRouter | API key from openrouter.ai |
+| `ANTHROPIC_API_KEY` | Anthropic | Direct Anthropic API key |
+| `OPENAI_API_KEY` | OpenAI | OpenAI API key |
+
+### Example Setup
+
+```bash
+# For Chutes AI
+export CHUTES_API_TOKEN="your-token"
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+
+# For OpenRouter
+export OPENROUTER_API_KEY="sk-or-v1-..."
+export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+```
+
+---
+
+## Configuration Sections
+
+### Model Settings
+
+```mermaid
+graph LR
+    subgraph Model["Model Configuration"]
+        M1["model<br/>Model identifier"]
+        M2["provider<br/>API provider"]
+        M3["temperature<br/>Response randomness"]
+        M4["max_tokens<br/>Max output tokens"]
+        M5["reasoning_effort<br/>Reasoning depth"]
+    end
+```
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `model` | `str` | `openrouter/anthropic/claude-sonnet-4-20250514` | Full model identifier with provider prefix |
+| `provider` | `str` | `openrouter` | LLM provider name |
+| `temperature` | `float` | `0.0` | Response randomness (0 = deterministic) |
+| `max_tokens` | `int` | `16384` | Maximum tokens in LLM response |
+| `reasoning_effort` | `str` | `none` | Reasoning depth: `none`, `minimal`, `low`, `medium`, `high`, `xhigh` |
+
+### Agent Execution Settings
+
+```mermaid
+graph LR
+    subgraph Execution["Execution Limits"]
+        E1["max_iterations<br/>200 iterations"]
+        E2["max_output_tokens<br/>2500 tokens"]
+        E3["shell_timeout<br/>60 seconds"]
+    end
+```
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `max_iterations` | `int` | `200` | Maximum loop iterations before stopping |
+| `max_output_tokens` | `int` | `2500` | Max tokens for tool output truncation |
+| `shell_timeout` | `int` | `60` | Shell command timeout in seconds |
+
+### Context Management
+
+```mermaid
+graph TB
+    subgraph Context["Context Window Management"]
+        C1["model_context_limit: 200K"]
+        C2["output_token_max: 32K"]
+        C3["Usable: 168K"]
+        C4["auto_compact_threshold: 85%"]
+        C5["Trigger: ~143K"]
+    end
+    
+    C1 --> C3
+    C2 --> C3
+    C3 --> C4
+    C4 --> C5
+```
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `model_context_limit` | `int` | `200000` | Total model context window (tokens) |
+| `output_token_max` | `int` | `32000` | Tokens reserved for output |
+| `auto_compact_threshold` | `float` | `0.85` | Trigger compaction at this % of usable context |
+| `prune_protect` | `int` | `40000` | Protect this many tokens of recent tool output |
+| `prune_minimum` | `int` | `20000` | Only prune if recovering at least this many tokens |
+
+### Prompt Caching
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `cache_enabled` | `bool` | `True` | Enable Anthropic prompt caching |
+
+> **Note**: Prompt caching requires minimum token thresholds per breakpoint:
+> - Claude Opus 4.5 on Bedrock: 4096 tokens
+> - Claude Sonnet/other: 1024 tokens
+
+### Execution Flags
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `bypass_approvals` | `bool` | `True` | Skip user approval prompts |
+| `bypass_sandbox` | `bool` | `True` | Bypass sandbox restrictions |
+| `skip_git_check` | `bool` | `True` | Skip git repository validation |
+| `unified_exec` | `bool` | `True` | Enable unified execution mode |
+| `json_output` | `bool` | `True` | Always emit JSONL output |
+| `require_completion_confirmation` | `bool` | `False` | Require double-confirm before completing |
+
+---
+
+## Provider-Specific Configuration
+
+### Chutes AI
+
+```python
+# Environment
+CHUTES_API_TOKEN="your-token"
+LLM_PROVIDER="chutes"
+LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+
+# Model features
+# - 1T parameters, 32B activated
+# - 256K context window
+# - Thinking mode enabled by default
+# - Temperature: 1.0 (thinking), 0.6 (instant)
+```
+
+### OpenRouter
+
+```python
+# Environment
+OPENROUTER_API_KEY="sk-or-v1-..."
+LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+
+# Requires openrouter/ prefix for litellm
+```
+
+### Direct Anthropic
+
+```python
+# Environment
+ANTHROPIC_API_KEY="sk-ant-..."
+LLM_MODEL="claude-3-5-sonnet-20241022"
+
+# No prefix needed for direct API
+```
+
+---
+
+## Configuration Workflow
+
+```mermaid
+flowchart TB
+    subgraph Load["Configuration Loading"]
+        Env[Environment Variables]
+        File[defaults.py]
+        Merge[Merged Config]
+    end
+    
+    subgraph Apply["Configuration Application"]
+        Loop[Agent Loop]
+        LLM[LLM Client]
+        Context[Context Manager]
+        Tools[Tool Registry]
+    end
+    
+    Env --> Merge
+    File --> Merge
+    Merge --> Loop
+    Merge --> LLM
+    Merge --> Context
+    Merge --> Tools
+```
+
+---
+
+## Computed Values
+
+Some values are computed from configuration:
+
+```python
+# Usable context window
+usable_context = model_context_limit - output_token_max
+# Default: 200,000 - 32,000 = 168,000 tokens
+
+# Compaction trigger threshold
+compaction_trigger = usable_context * auto_compact_threshold
+# Default: 168,000 * 0.85 = 142,800 tokens
+
+# Token estimation
+chars_per_token = 4  # Heuristic
+tokens = len(text) // 4
+```
+
+---
+
+## Best Practices
+
+### For Cost Optimization
+
+```bash
+# Lower cost limit for testing
+export LLM_COST_LIMIT="1.0"
+
+# Use smaller context for simple tasks
+# (edit defaults.py)
+"model_context_limit": 100_000
+```
+
+### For Long Tasks
+
+```bash
+# Increase iterations
+# (edit defaults.py)
+"max_iterations": 500
+
+# Lower compaction threshold for aggressive memory management
+"auto_compact_threshold": 0.70
+```
+
+### For Debugging
+
+```bash
+# Disable caching to see full API calls
+# (edit defaults.py)
+"cache_enabled": False
+
+# Increase output limits for more context
+"max_output_tokens": 5000
+```
+
+---
+
+## Next Steps
+
+- [Chutes Integration](./chutes-integration.md) - Configure Chutes API
+- [Context Management](./context-management.md) - Understand memory management
+- [Best Practices](./best-practices.md) - Optimization tips
diff --git a/docs/context-management.md b/docs/context-management.md
new file mode 100644
index 0000000..2f26e75
--- /dev/null
+++ b/docs/context-management.md
@@ -0,0 +1,412 @@
+# Context Management
+
+> **How BaseAgent manages memory and prevents token overflow**
+
+## Why Context Management Matters
+
+Large Language Models have finite context windows. Without proper management:
+- "Context too long" errors terminate sessions
+- Critical information gets lost
+- Response quality degrades
+- Costs increase unnecessarily
+
+BaseAgent implements sophisticated context management inspired by OpenCode and Codex.
+
+---
+
+## Context Window Overview
+
+```mermaid
+graph TB
+    subgraph Window["Claude Opus 4.5 Context Window (200K tokens)"]
+        Output["Reserved for Output<br/>32K tokens"]
+        Usable["Usable Context<br/>168K tokens"]
+    end
+    
+    subgraph Thresholds["Management Thresholds"]
+        Safe["Safe Zone<br/>< 85% (143K)"]
+        Warning["Warning Zone<br/>85-100%"]
+        Overflow["Overflow<br/>> 168K"]
+    end
+    
+    Usable --> Safe
+    Usable --> Warning
+    Usable --> Overflow
+    
+    style Safe fill:#4CAF50,color:#fff
+    style Warning fill:#FF9800,color:#fff
+    style Overflow fill:#F44336,color:#fff
+```
+
+### Key Numbers
+
+| Metric | Value | Description |
+|--------|-------|-------------|
+| Total context | 200,000 | Model's full context window |
+| Output reserve | 32,000 | Reserved for LLM response |
+| Usable context | 168,000 | Available for messages |
+| Compaction threshold | 85% | Trigger at 142,800 tokens |
+| Prune protect | 40,000 | Recent tool output to keep |
+| Prune minimum | 20,000 | Minimum savings to prune |
+
+---
+
+## Token Estimation
+
+BaseAgent estimates tokens using a simple heuristic:
+
+```python
+# 1 token ≈ 4 characters
+def estimate_tokens(text: str) -> int:
+    return len(text) // 4
+```
+
+### Message Token Components
+
+```mermaid
+graph LR
+    subgraph Message["Message Token Estimation"]
+        Content["Content<br/>(text / 4)"]
+        Images["Images<br/>(~1000 each)"]
+        ToolCalls["Tool Calls<br/>(name + args)"]
+        Overhead["Role Overhead<br/>(~4 tokens)"]
+    end
+    
+    Content --> Total["Total Tokens"]
+    Images --> Total
+    ToolCalls --> Total
+    Overhead --> Total
+```
+
+---
+
+## Context Management Pipeline
+
+```mermaid
+flowchart TB
+    subgraph Input["Every Iteration"]
+        Messages["Current Messages"]
+    end
+    
+    subgraph Detection["1. Detection"]
+        Estimate["Estimate Total Tokens"]
+        Check{"Above 85%<br/>Threshold?"}
+    end
+    
+    subgraph Pruning["2. Pruning (First Pass)"]
+        Scan["Scan Backwards"]
+        Protect["Protect Last 40K<br/>Tool Output Tokens"]
+        Clear["Clear Old Tool Outputs"]
+        CheckAgain{"Still Above<br/>Threshold?"}
+    end
+    
+    subgraph Compaction["3. AI Compaction (Second Pass)"]
+        Summary["Generate Summary<br/>via LLM"]
+        Rebuild["Rebuild Messages:<br/>System + Summary"]
+    end
+    
+    subgraph Output["Continue Loop"]
+        Managed["Managed Messages"]
+    end
+    
+    Messages --> Estimate --> Check
+    Check -->|No| Managed
+    Check -->|Yes| Scan --> Protect --> Clear --> CheckAgain
+    CheckAgain -->|No| Managed
+    CheckAgain -->|Yes| Summary --> Rebuild --> Managed
+    
+    style Pruning fill:#FF9800,color:#fff
+    style Compaction fill:#9C27B0,color:#fff
+```
+
+---
+
+## Stage 1: Tool Output Pruning
+
+The first defense against context overflow is pruning old tool outputs.
+
+### Strategy
+
+1. Scan messages **backwards** (most recent first)
+2. Skip the first 2 user turns (most recent)
+3. Accumulate tool output tokens
+4. After 40K tokens accumulated, mark older outputs for pruning
+5. Only prune if savings exceed 20K tokens
+
+### Implementation
+
+```python
+def prune_old_tool_outputs(messages, protect_last_turns=2):
+    total = 0  # Total tool output tokens seen
+    pruned = 0  # Tokens to be pruned
+    to_prune = []
+    turns = 0
+    
+    for i in range(len(messages) - 1, -1, -1):
+        msg = messages[i]
+        
+        if msg["role"] == "user":
+            turns += 1
+        
+        if turns < protect_last_turns:
+            continue
+        
+        if msg["role"] == "tool":
+            content = msg.get("content", "")
+            estimate = len(content) // 4
+            total += estimate
+            
+            if total > PRUNE_PROTECT:  # 40K
+                pruned += estimate
+                to_prune.append(i)
+    
+    if pruned > PRUNE_MINIMUM:  # 20K
+        # Replace content with marker
+        for idx in to_prune:
+            messages[idx]["content"] = "[Old tool result content cleared]"
+    
+    return messages
+```
+
+### Visual Example
+
+```mermaid
+graph TB
+    subgraph Before["Before Pruning (150K tokens)"]
+        S1["System Prompt<br/>5K tokens"]
+        U1["User Instruction<br/>1K tokens"]
+        A1["Assistant + Tools<br/>10K tokens"]
+        T1["Tool Results (old)<br/>50K tokens"]
+        A2["Assistant + Tools<br/>10K tokens"]
+        T2["Tool Results (old)<br/>40K tokens"]
+        A3["Assistant + Tools<br/>10K tokens"]
+        T3["Tool Results (recent)<br/>24K tokens"]
+    end
+    
+    subgraph After["After Pruning (60K tokens)"]
+        S2["System Prompt<br/>5K tokens"]
+        U2["User Instruction<br/>1K tokens"]
+        A4["Assistant + Tools<br/>10K tokens"]
+        T4["[cleared]<br/>~0 tokens"]
+        A5["Assistant + Tools<br/>10K tokens"]
+        T5["[cleared]<br/>~0 tokens"]
+        A6["Assistant + Tools<br/>10K tokens"]
+        T6["Tool Results (protected)<br/>24K tokens"]
+    end
+    
+    T1 -.-> T4
+    T2 -.-> T5
+    T3 --> T6
+    
+    style T4 fill:#FF9800,color:#fff
+    style T5 fill:#FF9800,color:#fff
+    style T6 fill:#4CAF50,color:#fff
+```
+
+---
+
+## Stage 2: AI Compaction
+
+When pruning isn't enough, BaseAgent uses the LLM to summarize the conversation.
+
+### Compaction Process
+
+```mermaid
+sequenceDiagram
+    participant Loop as Agent Loop
+    participant Compact as Compaction
+    participant LLM as LLM API
+
+    Loop->>Compact: Context still too large
+    Compact->>Compact: Add compaction prompt
+    Compact->>LLM: Request summary
+    LLM-->>Compact: Summary response
+    Compact->>Compact: Build new messages
+    Compact-->>Loop: [System, Summary]
+```
+
+### Compaction Prompt
+
+```python
+COMPACTION_PROMPT = """
+You are performing a CONTEXT CHECKPOINT COMPACTION. 
+Create a handoff summary for another LLM that will resume the task.
+
+Include:
+- Current progress and key decisions made
+- Important context, constraints, or user preferences
+- What remains to be done (clear next steps)
+- Any critical data, examples, or references needed to continue
+- Which files were modified and how
+- Any errors encountered and how they were resolved
+
+Be concise, structured, and focused on helping the next LLM 
+seamlessly continue the work. Use bullet points and clear sections.
+"""
+```
+
+### Result
+
+The compacted messages are:
+
+```python
+compacted = [
+    {"role": "system", "content": original_system_prompt},
+    {"role": "user", "content": SUMMARY_PREFIX + llm_summary},
+]
+```
+
+### Summary Prefix
+
+```python
+SUMMARY_PREFIX = """
+Another language model started to solve this problem and produced 
+a summary of its thinking process. You also have access to the state 
+of the tools that were used. Use this to build on the work that has 
+already been done and avoid duplicating work.
+
+Here is the summary from the previous context:
+
+"""
+```
+
+---
+
+## Middle-Out Truncation
+
+For individual tool outputs, BaseAgent uses middle-out truncation:
+
+```mermaid
+graph LR
+    subgraph Original["Original Output"]
+        O1["Start<br/>(headers, definitions)"]
+        O2["Middle<br/>(repetitive data)"]
+        O3["End<br/>(results, errors)"]
+    end
+    
+    subgraph Truncated["Truncated Output"]
+        T1["Start<br/>(preserved)"]
+        T2["[...truncated...]"]
+        T3["End<br/>(preserved)"]
+    end
+    
+    O1 --> T1
+    O2 -.-> T2
+    O3 --> T3
+    
+    style O2 fill:#FF9800,color:#fff
+    style T2 fill:#FF9800,color:#fff
+```
+
+### Implementation
+
+```python
+def middle_out_truncate(text: str, max_tokens: int = 2500) -> str:
+    max_chars = max_tokens * 4  # 4 chars per token
+    
+    if len(text) <= max_chars:
+        return text
+    
+    keep = max_chars // 2 - 50  # Room for marker
+    return f"{text[:keep]}\n\n[...truncated...]\n\n{text[-keep:]}"
+```
+
+### Why Middle-Out?
+
+| Section | Contains | Value |
+|---------|----------|-------|
+| **Start** | Headers, imports, definitions | High |
+| **Middle** | Repetitive data, logs | Low |
+| **End** | Results, errors, summaries | High |
+
+---
+
+## Configuration Options
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `model_context_limit` | 200,000 | Total context window |
+| `output_token_max` | 32,000 | Reserved for output |
+| `auto_compact_threshold` | 0.85 | Trigger threshold |
+| `prune_protect` | 40,000 | Recent tool tokens to keep |
+| `prune_minimum` | 20,000 | Minimum savings to prune |
+| `max_output_tokens` | 2,500 | Per-tool output limit |
+
+### Tuning Guidelines
+
+**For Long Tasks:**
+```python
+"auto_compact_threshold": 0.70,  # More aggressive
+"prune_protect": 30_000,          # Protect less
+```
+
+**For Complex Tasks (need more context):**
+```python
+"auto_compact_threshold": 0.90,  # Less aggressive
+"prune_protect": 60_000,          # Protect more
+```
+
+---
+
+## Monitoring Context Usage
+
+BaseAgent logs context status each iteration:
+
+```
+[14:30:16] [compaction] Context: 45000 tokens (26.8% of 168000)
+[14:35:22] [compaction] Context: 125000 tokens (74.4% of 168000)
+[14:38:45] [compaction] Context: 148000 tokens (88.1% of 168000)
+[14:38:45] [compaction] Context overflow detected, managing...
+[14:38:45] [compaction] Prune scan: 95000 total tokens, 55000 prunable
+[14:38:45] [compaction] Pruning 12 tool outputs, recovering ~55000 tokens
+[14:38:46] [compaction] Pruning sufficient: 148000 -> 93000 tokens
+```
+
+---
+
+## Best Practices
+
+### 1. Keep Tool Outputs Focused
+
+```bash
+# ❌ Too much output
+ls -laR /  # Lists entire filesystem
+
+# ✅ Targeted
+ls -la /workspace/src/  # Just what's needed
+```
+
+### 2. Use Appropriate Search Patterns
+
+```bash
+# ❌ Too broad
+grep "function"  # Matches everything
+
+# ✅ Specific
+grep "def calculate_total" src/billing.py
+```
+
+### 3. Read Sections, Not Entire Files
+
+```json
+// ❌ Entire large file
+{"name": "read_file", "arguments": {"file_path": "huge.py"}}
+
+// ✅ Specific section
+{"name": "read_file", "arguments": {"file_path": "huge.py", "offset": 100, "limit": 50}}
+```
+
+### 4. Monitor Long Sessions
+
+For tasks exceeding 50 iterations, watch for:
+- Repeated compaction events
+- Context oscillating near threshold
+- Loss of important context after compaction
+
+---
+
+## Next Steps
+
+- [Best Practices](./best-practices.md) - Optimization strategies
+- [Configuration](./configuration.md) - Tuning options
+- [Architecture](./architecture.md) - System design
diff --git a/docs/installation.md b/docs/installation.md
new file mode 100644
index 0000000..24d6700
--- /dev/null
+++ b/docs/installation.md
@@ -0,0 +1,249 @@
+# Installation Guide
+
+> **Step-by-step instructions for setting up BaseAgent**
+
+## Prerequisites
+
+Before installing BaseAgent, ensure you have:
+
+| Requirement | Version | Notes |
+|-------------|---------|-------|
+| Python | 3.9+ | Python 3.11+ recommended |
+| pip | Latest | Python package manager |
+| Git | 2.x | For cloning the repository |
+
+### Optional but Recommended
+
+| Tool | Purpose |
+|------|---------|
+| `ripgrep` (`rg`) | Fast file searching (used by `grep_files` tool) |
+| `tree` | Directory visualization |
+
+---
+
+## Installation Methods
+
+### Method 1: Using pyproject.toml (Recommended)
+
+```bash
+# Clone the repository
+git clone https://github.com/your-org/baseagent.git
+cd baseagent
+
+# Install with pip
+pip install .
+```
+
+This installs BaseAgent as a package with all dependencies.
+
+### Method 2: Using requirements.txt
+
+```bash
+# Clone the repository
+git clone https://github.com/your-org/baseagent.git
+cd baseagent
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### Method 3: Development Installation
+
+For development with editable installs:
+
+```bash
+git clone https://github.com/your-org/baseagent.git
+cd baseagent
+
+# Editable install
+pip install -e .
+```
+
+---
+
+## Dependencies
+
+BaseAgent requires these Python packages:
+
+```
+litellm>=1.0.0          # LLM API abstraction
+httpx>=0.24.0           # HTTP client
+pydantic>=2.0.0         # Data validation
+```
+
+These are automatically installed via pip.
+
+---
+
+## Environment Setup
+
+### 1. Choose Your LLM Provider
+
+BaseAgent supports multiple LLM providers. Choose one:
+
+#### Option A: Chutes AI (Recommended)
+
+```bash
+# Set your Chutes API token
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
+
+# Configure provider
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+Get your token at [chutes.ai](https://chutes.ai)
+
+#### Option B: OpenRouter
+
+```bash
+# Set your OpenRouter API key
+export OPENROUTER_API_KEY="sk-or-v1-..."
+
+# Model is auto-configured for OpenRouter
+```
+
+Get your key at [openrouter.ai](https://openrouter.ai)
+
+#### Option C: Direct Provider APIs
+
+```bash
+# For Anthropic
+export ANTHROPIC_API_KEY="sk-ant-..."
+
+# For OpenAI
+export OPENAI_API_KEY="sk-..."
+```
+
+### 2. Create a Configuration File (Optional)
+
+Create `.env` in the project root:
+
+```bash
+# .env file
+CHUTES_API_TOKEN=your-token-here
+LLM_PROVIDER=chutes
+LLM_MODEL=moonshotai/Kimi-K2.5-TEE
+LLM_COST_LIMIT=10.0
+```
+
+---
+
+## Verification
+
+### Step 1: Verify Python Installation
+
+```bash
+python3 --version
+# Expected: Python 3.11.x or higher
+```
+
+### Step 2: Verify Dependencies
+
+```bash
+python3 -c "import litellm; print('litellm:', litellm.__version__)"
+python3 -c "import httpx; print('httpx:', httpx.__version__)"
+python3 -c "import pydantic; print('pydantic:', pydantic.__version__)"
+```
+
+### Step 3: Verify BaseAgent Installation
+
+```bash
+python3 -c "from src.core.loop import run_agent_loop; print('BaseAgent: OK')"
+```
+
+### Step 4: Test Run
+
+```bash
+python3 agent.py --instruction "Print 'Hello, BaseAgent!'"
+```
+
+Expected output: JSONL events showing the agent executing your instruction.
+
+---
+
+## Directory Structure After Installation
+
+```
+baseagent/
+├── agent.py                 # ✓ Entry point
+├── src/
+│   ├── core/
+│   │   ├── loop.py          # ✓ Agent loop
+│   │   └── compaction.py    # ✓ Context manager
+│   ├── llm/
+│   │   └── client.py        # ✓ LLM client
+│   ├── config/
+│   │   └── defaults.py      # ✓ Configuration
+│   ├── tools/               # ✓ Tool implementations
+│   ├── prompts/
+│   │   └── system.py        # ✓ System prompt
+│   └── output/
+│       └── jsonl.py         # ✓ Event emission
+├── requirements.txt         # ✓ Dependencies
+├── pyproject.toml           # ✓ Package config
+├── docs/                    # ✓ Documentation
+├── rules/                   # Development guidelines
+└── astuces/                 # Implementation techniques
+```
+
+---
+
+## Troubleshooting
+
+### Issue: `ModuleNotFoundError: No module named 'litellm'`
+
+**Solution**: Install dependencies
+
+```bash
+pip install -r requirements.txt
+# or
+pip install litellm httpx pydantic
+```
+
+### Issue: `ImportError: cannot import name 'run_agent_loop'`
+
+**Solution**: Ensure you're in the project root directory
+
+```bash
+cd /path/to/baseagent
+python3 agent.py --instruction "..."
+```
+
+### Issue: API Key Errors
+
+**Solution**: Verify your environment variables are set
+
+```bash
+# Check if variables are set
+echo $CHUTES_API_TOKEN
+echo $OPENROUTER_API_KEY
+
+# Re-export if needed
+export CHUTES_API_TOKEN="your-token"
+```
+
+### Issue: `rg` (ripgrep) Not Found
+
+The `grep_files` tool will fall back to `grep` if `rg` is not available, but ripgrep is much faster.
+
+**Solution**: Install ripgrep
+
+```bash
+# Ubuntu/Debian
+apt-get install ripgrep
+
+# macOS
+brew install ripgrep
+
+# Or via cargo
+cargo install ripgrep
+```
+
+---
+
+## Next Steps
+
+- [Quick Start](./quickstart.md) - Run your first task
+- [Configuration](./configuration.md) - Customize settings
+- [Chutes Integration](./chutes-integration.md) - Set up Chutes API
diff --git a/docs/overview.md b/docs/overview.md
new file mode 100644
index 0000000..c05a533
--- /dev/null
+++ b/docs/overview.md
@@ -0,0 +1,214 @@
+# BaseAgent Overview
+
+> **A high-performance autonomous coding agent built for generalist problem-solving**
+
+## What is BaseAgent?
+
+BaseAgent is an autonomous coding agent designed for the [Term Challenge](https://term.challenge). Unlike traditional scripted automation, BaseAgent uses Large Language Models (LLMs) to reason about tasks and make decisions dynamically.
+
+The agent receives natural language instructions and autonomously:
+- Explores the codebase
+- Plans and executes solutions
+- Validates its own work
+- Handles errors and edge cases
+
+---
+
+## Core Design Principles
+
+### 1. No Hardcoding
+
+BaseAgent follows the **Golden Rule**: all decisions are made by the LLM, not by conditional logic.
+
+```python
+# ❌ FORBIDDEN - Hardcoded task routing
+if "file" in instruction:
+    create_file()
+elif "compile" in instruction:
+    compile_code()
+
+# ✅ REQUIRED - LLM-driven decisions
+response = llm.chat(messages, tools=tools)
+execute(response.tool_calls)
+```
+
+### 2. Single Code Path
+
+Every task, regardless of complexity or domain, flows through the same agent loop:
+
+```mermaid
+graph LR
+    A[Receive Instruction] --> B[Build Context]
+    B --> C[LLM Decides]
+    C --> D[Execute Tools]
+    D --> E{Complete?}
+    E -->|No| C
+    E -->|Yes| F[Verify & Return]
+```
+
+### 3. Iterative Execution
+
+BaseAgent never tries to solve everything in one shot. Instead, it:
+- Observes the current state
+- Thinks about the next step
+- Acts by calling tools
+- Repeats until the task is complete
+
+### 4. Self-Verification
+
+Before declaring a task complete, the agent automatically:
+1. Re-reads the original instruction
+2. Lists all requirements (explicit and implicit)
+3. Verifies each requirement with actual commands
+4. Only completes if all verifications pass
+
+---
+
+## High-Level Architecture
+
+```mermaid
+graph TB
+    subgraph Interface["User Interface"]
+        CLI["python agent.py --instruction '...'"]
+    end
+    
+    subgraph Engine["Core Engine"]
+        direction TB
+        Loop["Agent Loop<br/>(src/core/loop.py)"]
+        Context["Context Manager<br/>(src/core/compaction.py)"]
+        Prompt["System Prompt<br/>(src/prompts/system.py)"]
+    end
+    
+    subgraph LLM["LLM Layer"]
+        Client["LiteLLM Client<br/>(src/llm/client.py)"]
+        API["Provider API<br/>(Chutes/OpenRouter)"]
+    end
+    
+    subgraph Tools["Tool System"]
+        Registry["Tool Registry"]
+        Exec["Execution Engine"]
+    end
+    
+    CLI --> Loop
+    Loop --> Context
+    Loop --> Prompt
+    Loop --> Client
+    Client --> API
+    Loop --> Registry
+    Registry --> Exec
+    
+    style Loop fill:#4CAF50,color:#fff
+    style Client fill:#2196F3,color:#fff
+```
+
+---
+
+## Key Features
+
+### Autonomous Operation
+
+BaseAgent runs in **fully autonomous mode**:
+- No user confirmations required
+- Makes reasonable decisions when faced with ambiguity
+- Handles errors by trying alternative approaches
+- Never asks questions - just executes
+
+### Prompt Caching
+
+Achieves **90%+ cache hit rate** using Anthropic's prompt caching:
+- System prompt cached for stability
+- Last 2 messages cached to extend prefix
+- Reduces API costs by 90%
+
+### Context Management
+
+Intelligent memory management for long tasks:
+- Token-based overflow detection
+- Tool output pruning (protects recent outputs)
+- AI-powered compaction when needed
+- Middle-out truncation for large outputs
+
+### Comprehensive Tooling
+
+Eight specialized tools for coding tasks:
+
+| Tool | Purpose |
+|------|---------|
+| `shell_command` | Execute shell commands |
+| `read_file` | Read files with line numbers |
+| `write_file` | Create or overwrite files |
+| `apply_patch` | Surgical file modifications |
+| `grep_files` | Fast file content search |
+| `list_dir` | Directory exploration |
+| `view_image` | Image analysis |
+| `update_plan` | Progress tracking |
+
+---
+
+## Workflow Overview
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant CLI as agent.py
+    participant Loop as Agent Loop
+    participant LLM as LLM (Chutes/OpenRouter)
+    participant Tools as Tool Registry
+
+    User->>CLI: python agent.py --instruction "..."
+    CLI->>Loop: Initialize session
+    
+    loop Until task complete
+        Loop->>Loop: Manage context (prune/compact)
+        Loop->>Loop: Apply prompt caching
+        Loop->>LLM: Send messages + tools
+        LLM-->>Loop: Response (text + tool_calls)
+        
+        alt Has tool calls
+            Loop->>Tools: Execute tool calls
+            Tools-->>Loop: Tool results
+        else No tool calls
+            Loop->>Loop: Self-verification check
+        end
+    end
+    
+    Loop-->>CLI: Task complete
+    CLI-->>User: JSONL output
+```
+
+---
+
+## What Makes BaseAgent a "Generalist"?
+
+| Characteristic | Description |
+|----------------|-------------|
+| **Single code path** | Same logic handles ALL tasks |
+| **LLM-driven decisions** | LLM chooses actions, not if-statements |
+| **No task keywords** | Zero references to specific task content |
+| **Iterative execution** | Observe → Think → Act loop |
+
+### The Generalist Test
+
+Ask yourself: *"Would this code behave differently if I changed the task instruction?"*
+
+If **YES** and it's not because of LLM reasoning → it's hardcoding → **FORBIDDEN**
+
+---
+
+## Design Philosophy
+
+BaseAgent is built on these principles:
+
+1. **Explore First** - Always gather context before acting
+2. **Iterate** - Never try to do everything in one shot
+3. **Verify** - Double-confirm before completing
+4. **Fail Gracefully** - Handle errors and retry
+5. **Stay Focused** - Complete the task, nothing more
+
+---
+
+## Next Steps
+
+- [Installation Guide](./installation.md) - Set up BaseAgent
+- [Quick Start](./quickstart.md) - Run your first task
+- [Architecture](./architecture.md) - Deep dive into the system design
diff --git a/docs/quickstart.md b/docs/quickstart.md
new file mode 100644
index 0000000..f8a9326
--- /dev/null
+++ b/docs/quickstart.md
@@ -0,0 +1,242 @@
+# Quick Start Guide
+
+> **Get BaseAgent running in 5 minutes**
+
+## Prerequisites
+
+Before starting, ensure you have:
+- Python 3.9+ installed
+- An LLM API key (Chutes, OpenRouter, or Anthropic)
+- BaseAgent installed (see [Installation](./installation.md))
+
+---
+
+## Step 1: Set Up Your API Key
+
+Choose your provider and set the environment variable:
+
+```bash
+# For Chutes AI (recommended)
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
+
+# OR for OpenRouter
+export OPENROUTER_API_KEY="sk-or-v1-..."
+```
+
+---
+
+## Step 2: Run Your First Task
+
+Navigate to the BaseAgent directory and run:
+
+```bash
+python3 agent.py --instruction "Create a file called hello.txt with the content 'Hello, World!'"
+```
+
+### Expected Output
+
+You'll see JSONL events as the agent works:
+
+```json
+{"type": "thread.started", "thread_id": "sess_1234567890"}
+{"type": "turn.started"}
+{"type": "item.started", "item": {"type": "command_execution", "command": "write_file"}}
+{"type": "item.completed", "item": {"type": "command_execution", "status": "completed"}}
+{"type": "turn.completed", "usage": {"input_tokens": 5000, "output_tokens": 200}}
+```
+
+And the file `hello.txt` will be created:
+
+```bash
+cat hello.txt
+# Output: Hello, World!
+```
+
+---
+
+## Step 3: Try More Examples
+
+### Example: Explore a Codebase
+
+```bash
+python3 agent.py --instruction "Explore this repository and describe its structure"
+```
+
+### Example: Find and Read Files
+
+```bash
+python3 agent.py --instruction "Find all Python files and show me the main entry point"
+```
+
+### Example: Create a Simple Script
+
+```bash
+python3 agent.py --instruction "Create a Python script that prints the Fibonacci sequence up to 100"
+```
+
+### Example: Modify Existing Code
+
+```bash
+python3 agent.py --instruction "Add a docstring to all functions in src/core/loop.py"
+```
+
+---
+
+## Understanding the Output
+
+BaseAgent emits JSONL (JSON Lines) format for machine-readable output:
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Agent
+    participant stdout as Output
+
+    User->>Agent: --instruction "..."
+    Agent->>stdout: {"type": "thread.started", ...}
+    Agent->>stdout: {"type": "turn.started"}
+    
+    loop Tool Execution
+        Agent->>stdout: {"type": "item.started", ...}
+        Agent->>stdout: {"type": "item.completed", ...}
+    end
+    
+    Agent->>stdout: {"type": "turn.completed", "usage": {...}}
+```
+
+### Key Event Types
+
+| Event | Description |
+|-------|-------------|
+| `thread.started` | Session begins with unique ID |
+| `turn.started` | Agent begins processing |
+| `item.started` | Tool execution begins |
+| `item.completed` | Tool execution finished |
+| `turn.completed` | Agent finished with usage stats |
+| `turn.failed` | Error occurred |
+
+---
+
+## Quick Command Reference
+
+```bash
+# Basic usage
+python3 agent.py --instruction "Your task description"
+
+# With environment variables inline
+CHUTES_API_TOKEN="..." python3 agent.py --instruction "..."
+
+# Redirect output to file
+python3 agent.py --instruction "..." > output.jsonl 2>&1
+```
+
+---
+
+## Agent Workflow
+
+Here's what happens when you run a task:
+
+```mermaid
+flowchart TB
+    subgraph Input
+        Cmd["python3 agent.py --instruction '...'"]
+    end
+    
+    subgraph Init["Initialization"]
+        Parse[Parse Arguments]
+        Config[Load Configuration]
+        LLM[Initialize LLM Client]
+        Tools[Register Tools]
+    end
+    
+    subgraph Loop["Agent Loop"]
+        Context[Manage Context]
+        Cache[Apply Caching]
+        Call[Call LLM]
+        Execute[Execute Tools]
+        Verify[Self-Verify]
+    end
+    
+    subgraph Output
+        JSONL[Emit JSONL Events]
+        Done[Task Complete]
+    end
+    
+    Cmd --> Parse --> Config --> LLM --> Tools
+    Tools --> Context --> Cache --> Call
+    Call --> Execute --> Context
+    Execute --> Verify --> Done
+    Context & Call & Execute --> JSONL
+```
+
+---
+
+## Tips for Effective Instructions
+
+### Be Specific
+
+```bash
+# ❌ Too vague
+python3 agent.py --instruction "Fix the bug"
+
+# ✅ Specific
+python3 agent.py --instruction "Fix the TypeError in src/utils.py line 42 where x is None"
+```
+
+### Provide Context
+
+```bash
+# ❌ Missing context
+python3 agent.py --instruction "Add tests"
+
+# ✅ With context
+python3 agent.py --instruction "Add unit tests for the calculate_total function in src/billing.py"
+```
+
+### Request Verification
+
+```bash
+# ✅ Ask for verification
+python3 agent.py --instruction "Create a Python script for sorting and verify it works with sample data"
+```
+
+---
+
+## Troubleshooting
+
+### Agent Not Finding Files
+
+The agent starts in the current directory. Ensure you're in the right location:
+
+```bash
+pwd  # Check current directory
+ls   # List files
+cd /path/to/project
+python3 /path/to/baseagent/agent.py --instruction "..."
+```
+
+### API Rate Limits
+
+If you hit rate limits, the agent will automatically retry with exponential backoff. You can also:
+
+```bash
+# Set a cost limit
+export LLM_COST_LIMIT="5.0"
+```
+
+### Long-Running Tasks
+
+For complex tasks, the agent may iterate many times. Monitor progress through the JSONL output:
+
+```bash
+python3 agent.py --instruction "..." 2>&1 | grep "item.completed"
+```
+
+---
+
+## Next Steps
+
+- [Usage Guide](./usage.md) - Detailed command-line options
+- [Configuration](./configuration.md) - Customize behavior
+- [Tools Reference](./tools.md) - Available tools
+- [Best Practices](./best-practices.md) - Optimization tips
diff --git a/docs/tools.md b/docs/tools.md
new file mode 100644
index 0000000..78cd143
--- /dev/null
+++ b/docs/tools.md
@@ -0,0 +1,509 @@
+# Tools Reference
+
+> **Complete documentation for all available tools in BaseAgent**
+
+## Overview
+
+BaseAgent provides eight specialized tools for autonomous task execution. Each tool is designed for a specific purpose and follows consistent patterns for input and output.
+
+---
+
+## Tool Summary
+
+| Tool | Purpose | Key Parameters |
+|------|---------|----------------|
+| `shell_command` | Execute shell commands | `command`, `workdir`, `timeout_ms` |
+| `read_file` | Read file contents | `file_path`, `offset`, `limit` |
+| `write_file` | Create/overwrite files | `file_path`, `content` |
+| `apply_patch` | Surgical file edits | `patch` |
+| `grep_files` | Search file contents | `pattern`, `include`, `path` |
+| `list_dir` | List directory contents | `dir_path`, `depth`, `limit` |
+| `view_image` | Analyze images | `path` |
+| `update_plan` | Track progress | `steps`, `explanation` |
+
+---
+
+## Tool Architecture
+
+```mermaid
+graph TB
+    subgraph Registry["Tool Registry (registry.py)"]
+        Lookup["Tool Lookup"]
+        Execute["Execution Engine"]
+        Truncate["Output Truncation"]
+    end
+    
+    subgraph Tools["Tool Implementations"]
+        Shell["shell_command"]
+        Read["read_file"]
+        Write["write_file"]
+        Patch["apply_patch"]
+        Grep["grep_files"]
+        List["list_dir"]
+        Image["view_image"]
+        Plan["update_plan"]
+    end
+    
+    subgraph Output["Results"]
+        Success["ToolResult(success=True)"]
+        Failure["ToolResult(success=False)"]
+    end
+    
+    Lookup --> Shell & Read & Write & Patch & Grep & List & Image & Plan
+    Shell & Read & Write & Patch & Grep & List & Image & Plan --> Execute
+    Execute --> Truncate
+    Truncate --> Success & Failure
+```
+
+---
+
+## shell_command
+
+Execute shell commands in the terminal.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `command` | string | Yes | - | Shell command to execute |
+| `workdir` | string | No | Current dir | Working directory |
+| `timeout_ms` | number | No | 60000 | Timeout in milliseconds |
+
+### Example Usage
+
+```json
+{
+  "name": "shell_command",
+  "arguments": {
+    "command": "ls -la",
+    "workdir": "/workspace",
+    "timeout_ms": 30000
+  }
+}
+```
+
+### Best Practices
+
+- Always set `workdir` to avoid directory confusion
+- Use `rg` (ripgrep) instead of `grep` for faster searches
+- Set appropriate timeouts for long-running commands
+- Prefer specific commands over `cd && command`
+
+### Output Format
+
+```
+total 40
+drwxr-xr-x 7 root root 4096 Feb  3 13:16 .
+drwxr-xr-x 1 root root 4096 Feb  3 12:00 ..
+-rw-r--r-- 1 root root 5432 Feb  3 13:16 agent.py
+drwxr-xr-x 4 root root 4096 Feb  3 13:16 src
+```
+
+---
+
+## read_file
+
+Read file contents with line numbers.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_path` | string | Yes | - | Path to the file |
+| `offset` | number | No | 1 | Starting line (1-indexed) |
+| `limit` | number | No | 2000 | Maximum lines to return |
+
+### Example Usage
+
+```json
+{
+  "name": "read_file",
+  "arguments": {
+    "file_path": "src/core/loop.py",
+    "offset": 1,
+    "limit": 100
+  }
+}
+```
+
+### Output Format
+
+```
+L1: """
+L2: Main agent loop - the heart of the SuperAgent system.
+L3: """
+L4: 
+L5: from __future__ import annotations
+L6: import time
+```
+
+### Best Practices
+
+- Use `offset` and `limit` for large files
+- Prefer `grep_files` to find specific content first
+- Read relevant sections, not entire large files
+
+---
+
+## write_file
+
+Create or overwrite a file.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_path` | string | Yes | - | Path to the file |
+| `content` | string | Yes | - | Content to write |
+
+### Example Usage
+
+```json
+{
+  "name": "write_file",
+  "arguments": {
+    "file_path": "hello.txt",
+    "content": "Hello, World!\n"
+  }
+}
+```
+
+### Best Practices
+
+- Use for new files or complete rewrites
+- Prefer `apply_patch` for surgical edits
+- Parent directories are created automatically
+- Include trailing newlines for proper file endings
+
+---
+
+## apply_patch
+
+Apply surgical file modifications using patch format.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `patch` | string | Yes | - | Patch content |
+
+### Patch Format
+
+```
+*** Begin Patch
+*** Add File: path/to/new/file.py
++line 1
++line 2
+*** Update File: path/to/existing/file.py
+@@ def existing_function():
+-    old_line
++    new_line
+*** Delete File: path/to/delete.py
+*** End Patch
+```
+
+### Example Usage
+
+```json
+{
+  "name": "apply_patch",
+  "arguments": {
+    "patch": "*** Begin Patch\n*** Update File: src/utils.py\n@@ def calculate(x):\n-    return x\n+    return x * 2\n*** End Patch"
+  }
+}
+```
+
+### Patch Rules
+
+1. Use `@@ context line` to identify location
+2. Prefix new lines with `+`
+3. Prefix removed lines with `-`
+4. Include 3 lines of context before and after changes
+5. File paths must be relative (never absolute)
+
+### Operations
+
+| Operation | Format | Description |
+|-----------|--------|-------------|
+| Add file | `*** Add File: path` | Create new file |
+| Update file | `*** Update File: path` | Modify existing file |
+| Delete file | `*** Delete File: path` | Remove file |
+
+---
+
+## grep_files
+
+Search file contents using patterns.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `pattern` | string | Yes | - | Regex pattern to search |
+| `include` | string | No | - | Glob filter (e.g., `*.py`) |
+| `path` | string | No | Current dir | Search path |
+| `limit` | number | No | 100 | Max files to return |
+
+### Example Usage
+
+```json
+{
+  "name": "grep_files",
+  "arguments": {
+    "pattern": "def.*token",
+    "include": "*.py",
+    "path": "src/",
+    "limit": 50
+  }
+}
+```
+
+### Output Format
+
+```
+src/llm/client.py
+src/core/compaction.py
+src/utils/truncate.py
+```
+
+### Best Practices
+
+- Use ripgrep regex syntax
+- Filter with `include` for faster searches
+- Search specific directories when possible
+- Results sorted by modification time
+
+---
+
+## list_dir
+
+List directory contents with type indicators.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `dir_path` | string | Yes | - | Directory path |
+| `offset` | number | No | 1 | Starting entry (1-indexed) |
+| `limit` | number | No | 50 | Max entries to return |
+| `depth` | number | No | 2 | Max directory depth |
+
+### Example Usage
+
+```json
+{
+  "name": "list_dir",
+  "arguments": {
+    "dir_path": "src/",
+    "depth": 3,
+    "limit": 100
+  }
+}
+```
+
+### Output Format
+
+```
+src/
+  core/
+    loop.py
+    compaction.py
+  llm/
+    client.py
+  tools/
+    shell.py
+    read_file.py
+```
+
+### Type Indicators
+
+| Indicator | Meaning |
+|-----------|---------|
+| `/` | Directory |
+| `@` | Symbolic link |
+| (none) | Regular file |
+
+---
+
+## view_image
+
+Load and analyze an image from the filesystem.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `path` | string | Yes | - | Path to image file |
+
+### Supported Formats
+
+- PNG
+- JPEG
+- GIF
+- WebP
+- BMP
+
+### Example Usage
+
+```json
+{
+  "name": "view_image",
+  "arguments": {
+    "path": "screenshots/error.png"
+  }
+}
+```
+
+### How It Works
+
+```mermaid
+sequenceDiagram
+    participant Agent
+    participant Tool as view_image
+    participant LLM as LLM API
+
+    Agent->>Tool: view_image(path)
+    Tool->>Tool: Load image file
+    Tool->>Tool: Encode as base64
+    Tool-->>Agent: ToolResult with inject_content
+    Agent->>Agent: Add image to messages
+    Agent->>LLM: Messages with image content
+    LLM-->>Agent: Analysis response
+```
+
+### Best Practices
+
+- Only use for images the user mentioned
+- Don't use if image is already in conversation
+- Large images are automatically resized
+- Count as ~1000 tokens in context
+
+---
+
+## update_plan
+
+Track task progress with a visible plan.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `steps` | array | Yes | - | List of step objects |
+| `explanation` | string | No | - | Why the plan changed |
+
+### Step Object
+
+```json
+{
+  "description": "Create helper functions",
+  "status": "completed"
+}
+```
+
+### Status Values
+
+| Status | Description |
+|--------|-------------|
+| `pending` | Not started |
+| `in_progress` | Currently working |
+| `completed` | Finished |
+
+### Example Usage
+
+```json
+{
+  "name": "update_plan",
+  "arguments": {
+    "steps": [
+      {"description": "Read existing code", "status": "completed"},
+      {"description": "Create helper module", "status": "in_progress"},
+      {"description": "Write unit tests", "status": "pending"},
+      {"description": "Update documentation", "status": "pending"}
+    ],
+    "explanation": "Starting implementation after code review"
+  }
+}
+```
+
+### Best Practices
+
+- Keep descriptions to 5-7 words
+- Mark steps completed as you go
+- Update plan when approach changes
+- Use for complex multi-step tasks
+
+---
+
+## Tool Output Limits
+
+All tool outputs are truncated to prevent context overflow:
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `max_output_tokens` | 2500 | Maximum tokens per tool output |
+| Truncation strategy | Middle-out | Keeps start and end, removes middle |
+
+### Middle-Out Truncation
+
+```mermaid
+graph LR
+    subgraph Original["Original Output (10K tokens)"]
+        Start["First 1250 tokens"]
+        Middle["Middle section<br/>(removed)"]
+        End["Last 1250 tokens"]
+    end
+    
+    subgraph Truncated["Truncated Output (2500 tokens)"]
+        TStart["First 1250 tokens"]
+        Marker["[...truncated...]"]
+        TEnd["Last 1250 tokens"]
+    end
+    
+    Start --> TStart
+    End --> TEnd
+```
+
+**Why middle-out?**
+- Start contains headers, definitions
+- End contains results, errors
+- Middle is often repetitive
+
+---
+
+## Tool Execution Flow
+
+```mermaid
+flowchart TB
+    subgraph Request["LLM Request"]
+        ToolCall["tool_call: {name, arguments}"]
+    end
+    
+    subgraph Registry["Tool Registry"]
+        Lookup["Lookup Tool"]
+        Validate["Validate Arguments"]
+        Execute["Execute Tool"]
+    end
+    
+    subgraph Processing["Post-Processing"]
+        Truncate["Truncate Output"]
+        Format["Format Result"]
+    end
+    
+    subgraph Response["Tool Result"]
+        Success["success: true/false"]
+        Output["output: string"]
+        Inject["inject_content (images)"]
+    end
+    
+    ToolCall --> Lookup --> Validate --> Execute
+    Execute --> Truncate --> Format
+    Format --> Success & Output & Inject
+```
+
+---
+
+## Next Steps
+
+- [Usage Guide](./usage.md) - How to use the agent
+- [Context Management](./context-management.md) - Memory optimization
+- [Best Practices](./best-practices.md) - Effective tool usage
diff --git a/docs/usage.md b/docs/usage.md
new file mode 100644
index 0000000..d234c54
--- /dev/null
+++ b/docs/usage.md
@@ -0,0 +1,341 @@
+# Agent Usage Guide
+
+> **Complete guide to running BaseAgent and interpreting its output**
+
+## Command-Line Interface
+
+### Basic Syntax
+
+```bash
+python3 agent.py --instruction "Your task description"
+```
+
+### Required Arguments
+
+| Argument | Type | Description |
+|----------|------|-------------|
+| `--instruction` | string | The task for the agent to complete |
+
+---
+
+## Running the Agent
+
+### Simple Tasks
+
+```bash
+# Create a file
+python3 agent.py --instruction "Create a file called hello.txt with 'Hello, World!'"
+
+# Read and explain code
+python3 agent.py --instruction "Read src/core/loop.py and explain what it does"
+
+# Find files
+python3 agent.py --instruction "Find all Python files that contain 'import json'"
+```
+
+### Complex Tasks
+
+```bash
+# Multi-step task
+python3 agent.py --instruction "Create a Python module in src/utils/helpers.py with functions for string manipulation, then write tests for it"
+
+# Code modification
+python3 agent.py --instruction "Add error handling to all functions in src/api/client.py that make HTTP requests"
+
+# Investigation task
+python3 agent.py --instruction "Find the bug causing the TypeError in the test output and fix it"
+```
+
+---
+
+## Environment Variables
+
+Configure the agent's behavior with environment variables:
+
+```bash
+# LLM Provider (Chutes)
+export CHUTES_API_TOKEN="your-token"
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+
+# LLM Provider (OpenRouter)
+export OPENROUTER_API_KEY="sk-or-v1-..."
+export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+
+# Cost management
+export LLM_COST_LIMIT="10.0"
+
+# Run with inline variables
+LLM_COST_LIMIT="5.0" python3 agent.py --instruction "..."
+```
+
+---
+
+## Output Format
+
+BaseAgent emits JSONL (JSON Lines) events to stdout:
+
+```mermaid
+sequenceDiagram
+    participant Agent
+    participant stdout as Standard Output
+
+    Agent->>stdout: {"type": "thread.started", "thread_id": "sess_..."}
+    Agent->>stdout: {"type": "turn.started"}
+    
+    loop Tool Execution
+        Agent->>stdout: {"type": "item.started", "item": {...}}
+        Agent->>stdout: {"type": "item.completed", "item": {...}}
+    end
+    
+    Agent->>stdout: {"type": "turn.completed", "usage": {...}}
+```
+
+### Event Types
+
+| Event | Description |
+|-------|-------------|
+| `thread.started` | Session begins, includes unique thread ID |
+| `turn.started` | Agent begins processing the instruction |
+| `item.started` | A tool call is starting |
+| `item.completed` | A tool call has completed |
+| `turn.completed` | Agent finished, includes token usage |
+| `turn.failed` | An error occurred |
+
+### Example Output
+
+```json
+{"type": "thread.started", "thread_id": "sess_1706890123456"}
+{"type": "turn.started"}
+{"type": "item.started", "item": {"type": "command_execution", "id": "1", "command": "shell_command({command: 'ls -la'})", "status": "in_progress"}}
+{"type": "item.completed", "item": {"type": "command_execution", "id": "1", "command": "shell_command", "status": "completed", "aggregated_output": "total 40\ndrwxr-xr-x...", "exit_code": 0}}
+{"type": "item.completed", "item": {"type": "agent_message", "id": "2", "content": "I found the files. Now creating hello.txt..."}}
+{"type": "item.started", "item": {"type": "command_execution", "id": "3", "command": "write_file({file_path: 'hello.txt', content: 'Hello, World!'})", "status": "in_progress"}}
+{"type": "item.completed", "item": {"type": "command_execution", "id": "3", "command": "write_file", "status": "completed", "exit_code": 0}}
+{"type": "turn.completed", "usage": {"input_tokens": 5432, "cached_input_tokens": 4890, "output_tokens": 256}}
+```
+
+---
+
+## Logging Output
+
+Agent logs go to stderr:
+
+```
+[14:30:15] [superagent] ============================================================
+[14:30:15] [superagent] SuperAgent Starting (SDK 3.0 - litellm)
+[14:30:15] [superagent] ============================================================
+[14:30:15] [superagent] Model: openrouter/anthropic/claude-sonnet-4-20250514
+[14:30:15] [superagent] Instruction: Create hello.txt with 'Hello World'...
+[14:30:15] [loop] Getting initial state...
+[14:30:16] [loop] Iteration 1/200
+[14:30:16] [compaction] Context: 5432 tokens (3.2% of 168000)
+[14:30:16] [loop] Prompt caching: 1 system + 2 final messages marked (3 breakpoints)
+[14:30:17] [loop] Executing tool: write_file
+[14:30:17] [loop] Iteration 2/200
+[14:30:18] [loop] No tool calls in response
+[14:30:18] [loop] Requesting self-verification before completion
+```
+
+### Separating Output Streams
+
+```bash
+# Send JSONL to file, logs to terminal
+python3 agent.py --instruction "..." > output.jsonl
+
+# Send logs to file, JSONL to terminal
+python3 agent.py --instruction "..." 2> agent.log
+
+# Both to separate files
+python3 agent.py --instruction "..." > output.jsonl 2> agent.log
+```
+
+---
+
+## Processing Output
+
+### Parse JSONL with jq
+
+```bash
+# Get all completed items
+python3 agent.py --instruction "..." | jq 'select(.type == "item.completed")'
+
+# Get final usage stats
+python3 agent.py --instruction "..." | jq 'select(.type == "turn.completed") | .usage'
+
+# Get all agent messages
+python3 agent.py --instruction "..." | jq 'select(.item.type == "agent_message") | .item.content'
+```
+
+### Parse with Python
+
+```python
+import json
+import subprocess
+
+# Run agent and capture output
+result = subprocess.run(
+    ["python3", "agent.py", "--instruction", "Your task"],
+    capture_output=True,
+    text=True
+)
+
+# Parse JSONL output
+events = [json.loads(line) for line in result.stdout.strip().split('\n') if line]
+
+# Find usage stats
+for event in events:
+    if event.get("type") == "turn.completed":
+        print(f"Input tokens: {event['usage']['input_tokens']}")
+        print(f"Output tokens: {event['usage']['output_tokens']}")
+```
+
+---
+
+## Agent Workflow
+
+```mermaid
+flowchart TB
+    subgraph Input["Input Phase"]
+        Cmd["python3 agent.py --instruction '...'"]
+        Parse["Parse Arguments"]
+        Init["Initialize Components"]
+    end
+    
+    subgraph Explore["Exploration Phase"]
+        State["Get Current State"]
+        Context["Build Initial Context"]
+    end
+    
+    subgraph Execute["Execution Phase"]
+        Loop["Agent Loop"]
+        Tools["Execute Tools"]
+        Verify["Self-Verification"]
+    end
+    
+    subgraph Output["Output Phase"]
+        JSONL["Emit JSONL Events"]
+        Stats["Report Statistics"]
+    end
+    
+    Cmd --> Parse --> Init
+    Init --> State --> Context
+    Context --> Loop
+    Loop --> Tools --> Loop
+    Loop --> Verify
+    Verify --> Stats
+    Loop --> JSONL
+```
+
+---
+
+## Example Tasks
+
+### File Operations
+
+```bash
+# Create a file
+python3 agent.py --instruction "Create config.yaml with database settings for PostgreSQL"
+
+# Read and summarize
+python3 agent.py --instruction "Read README.md and create a one-paragraph summary"
+
+# Modify a file
+python3 agent.py --instruction "Add a new function to src/utils.py that validates email addresses"
+```
+
+### Code Analysis
+
+```bash
+# Explain code
+python3 agent.py --instruction "Explain how the authentication system works in src/auth/"
+
+# Find patterns
+python3 agent.py --instruction "Find all API endpoints and list them with their HTTP methods"
+
+# Review code
+python3 agent.py --instruction "Review src/api/handlers.py for potential security issues"
+```
+
+### Debugging
+
+```bash
+# Investigate error
+python3 agent.py --instruction "Find why 'test_user_creation' is failing and fix it"
+
+# Trace behavior
+python3 agent.py --instruction "Trace the data flow from user input to database in the signup process"
+```
+
+### Project Tasks
+
+```bash
+# Setup
+python3 agent.py --instruction "Create a Python project structure with src/, tests/, and setup.py"
+
+# Add feature
+python3 agent.py --instruction "Add logging to all functions in src/core/ using Python's logging module"
+
+# Refactor
+python3 agent.py --instruction "Refactor src/utils.py to follow the single responsibility principle"
+```
+
+---
+
+## Session Management
+
+Each agent run creates a new session with a unique ID:
+
+```json
+{"type": "thread.started", "thread_id": "sess_1706890123456"}
+```
+
+### Session Lifecycle
+
+```mermaid
+stateDiagram-v2
+    [*] --> Initializing: python3 agent.py
+    Initializing --> Running: thread.started
+    Running --> Iterating: turn.started
+    Iterating --> Executing: item.started
+    Executing --> Iterating: item.completed
+    Iterating --> Verifying: No tool calls
+    Verifying --> Iterating: Needs more work
+    Verifying --> Complete: Verified
+    Iterating --> Failed: Error
+    Complete --> [*]: turn.completed
+    Failed --> [*]: turn.failed
+```
+
+---
+
+## Performance Tips
+
+### Optimize Token Usage
+
+```bash
+# Set lower cost limit for testing
+export LLM_COST_LIMIT="2.0"
+```
+
+### Monitor Progress
+
+```bash
+# Watch tool executions in real-time
+python3 agent.py --instruction "..." 2>&1 | grep -E "Executing tool|Iteration"
+```
+
+### Debug Issues
+
+```bash
+# Full verbose output
+python3 agent.py --instruction "..." 2>&1 | tee agent_debug.log
+```
+
+---
+
+## Next Steps
+
+- [Tools Reference](./tools.md) - Available tools and their parameters
+- [Configuration](./configuration.md) - Customize agent behavior
+- [Best Practices](./best-practices.md) - Tips for effective usage