diff --git a/README.md b/README.md
index 3b9cf99..183091e 100644
--- a/README.md
+++ b/README.md
@@ -1,167 +1,130 @@
# BaseAgent - SDK 3.0
-High-performance autonomous agent for [Term Challenge](https://term.challenge). **Does NOT use term_sdk** - fully autonomous with litellm.
+High-performance autonomous agent for [Term Challenge](https://term.challenge). Supports multiple LLM providers with **Chutes API** (Kimi K2.5-TEE) as the default.
-## Installation
+## Quick Start
```bash
-# Via pyproject.toml
-pip install .
-
-# Via requirements.txt
+# 1. Install dependencies
pip install -r requirements.txt
-```
-## Usage
+# 2. Configure Chutes API (default provider)
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
-```bash
-python agent.py --instruction "Your task here..."
+# 3. Run the agent
+python3 agent.py --instruction "Your task description here..."
```
-The agent receives the instruction via `--instruction` and executes the task autonomously.
-
-## Mandatory Architecture
-
-> **IMPORTANT**: Agents MUST follow these rules to work correctly.
+### Alternative: OpenRouter
-### 1. Project Structure (MANDATORY)
-
-Agents **MUST** be structured projects, NOT single files:
-
-```
-my-agent/
-├── agent.py # Entry point with --instruction
-├── src/ # Modules
-│ ├── core/
-│ │ ├── loop.py # Main loop
-│ │ └── compaction.py # Context management (MANDATORY)
-│ ├── llm/
-│ │ └── client.py # LLM client (litellm)
-│ └── tools/
-│ └── ... # Available tools
-├── requirements.txt # Dependencies
-└── pyproject.toml # Project config
+```bash
+export LLM_PROVIDER="openrouter"
+export OPENROUTER_API_KEY="your-openrouter-key"
+python3 agent.py --instruction "Your task description here..."
```
-### 2. Session Management (MANDATORY)
-
-Agents **MUST** maintain complete conversation history:
-
-```python
-messages = [
- {"role": "system", "content": system_prompt},
- {"role": "user", "content": instruction},
-]
+## Documentation
-# Add each exchange
-messages.append({"role": "assistant", "content": response})
-messages.append({"role": "tool", "tool_call_id": id, "content": result})
+📚 **Full documentation available in [docs/](docs/)**
+
+### Getting Started
+- [Overview](docs/overview.md) - What is BaseAgent
+- [Installation](docs/installation.md) - Setup instructions
+- [Quick Start](docs/quickstart.md) - First task in 5 minutes
+
+### Core Concepts
+- [Architecture](docs/architecture.md) - Technical deep-dive with diagrams
+- [Configuration](docs/configuration.md) - All settings explained
+- [Usage Guide](docs/usage.md) - CLI commands and examples
+
+### Reference
+- [Tools Reference](docs/tools.md) - Available tools
+- [Context Management](docs/context-management.md) - Token optimization
+- [Best Practices](docs/best-practices.md) - Performance tips
+
+### LLM Providers
+- [Chutes Integration](docs/chutes-integration.md) - **Default provider setup**
+
+## Architecture Overview
+
+```mermaid
+graph TB
+ subgraph User
+ CLI["python3 agent.py --instruction"]
+ end
+
+ subgraph Core
+ Loop["Agent Loop"]
+ Context["Context Manager"]
+ end
+
+ subgraph LLM
+ Chutes["Chutes API (Kimi K2.5)"]
+ OpenRouter["OpenRouter (fallback)"]
+ end
+
+ subgraph Tools
+ Shell["shell_command"]
+ Files["read/write_file"]
+ Search["grep_files"]
+ end
+
+ CLI --> Loop
+ Loop --> Context
+ Loop -->|default| Chutes
+ Loop -->|fallback| OpenRouter
+ Loop --> Tools
```
-### 3. Context Compaction (MANDATORY)
-
-Compaction is **CRITICAL** for:
-- Avoiding "context too long" errors
-- Preserving critical information
-- Enabling complex multi-step tasks
-- Improving response coherence
+## Key Features
-```python
-# Recommended threshold: 85% of context window
-AUTO_COMPACT_THRESHOLD = 0.85
+| Feature | Description |
+|---------|-------------|
+| **Fully Autonomous** | No user confirmation needed |
+| **LLM-Driven** | All decisions made by the language model |
+| **Chutes API** | Default: Kimi K2.5-TEE (256K context, thinking mode) |
+| **Prompt Caching** | 90%+ cache hit rate |
+| **Context Management** | Intelligent pruning and compaction |
+| **Self-Verification** | Automatic validation before completion |
-# 2-step strategy:
-# 1. Pruning: Remove old tool outputs
-# 2. AI Compaction: Summarize conversation if pruning insufficient
-```
-
-## Features
+## Environment Variables
-### LLM Client (litellm)
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `CHUTES_API_TOKEN` | Yes* | - | Chutes API token |
+| `LLM_PROVIDER` | No | `chutes` | `chutes` or `openrouter` |
+| `LLM_MODEL` | No | `moonshotai/Kimi-K2.5-TEE` | Model identifier |
+| `LLM_COST_LIMIT` | No | `10.0` | Max cost in USD |
+| `OPENROUTER_API_KEY` | For OpenRouter | - | OpenRouter API key |
-```python
-from src.llm.client import LiteLLMClient
+*\*Required for default Chutes provider*
-llm = LiteLLMClient(
- model="openrouter/anthropic/claude-opus-4.5",
- temperature=0.0,
- max_tokens=16384,
-)
+## Project Structure
-response = llm.chat(messages, tools=tool_specs)
```
-
-### Prompt Caching
-
-Caches system and recent messages to reduce costs:
-- Cache hit rate: **90%+** on long conversations
-- Significant API cost reduction
-
-### Self-Verification
-
-Before completing, the agent automatically:
-1. Re-reads the original instruction
-2. Verifies each requirement
-3. Only confirms completion if everything is validated
-
-### Context Management
-
-- **Token-based overflow detection** (not message count)
-- **Tool output pruning** (removes old outputs)
-- **AI compaction** (summarizes if needed)
-- **Middle-out truncation** for large outputs
-
-## Available Tools
-
-| Tool | Description |
-|------|-------------|
-| `shell_command` | Execute shell commands |
-| `read_file` | Read files with pagination |
-| `write_file` | Create/overwrite files |
-| `apply_patch` | Apply patches |
-| `grep_files` | Search with ripgrep |
-| `list_dir` | List directories |
-| `view_image` | Analyze images |
-
-## Configuration
-
-See `src/config/defaults.py`:
-
-```python
-CONFIG = {
- "model": "openrouter/anthropic/claude-opus-4.5",
- "max_tokens": 16384,
- "max_iterations": 200,
- "auto_compact_threshold": 0.85,
- "prune_protect": 40_000,
- "cache_enabled": True,
-}
+baseagent/
+├── agent.py # Entry point
+├── src/
+│ ├── core/
+│ │ ├── loop.py # Main agent loop
+│ │ └── compaction.py # Context management
+│ ├── llm/
+│ │ └── client.py # LLM client
+│ ├── config/
+│ │ └── defaults.py # Configuration
+│ ├── tools/ # Tool implementations
+│ └── prompts/ # System prompt
+├── docs/ # 📚 Full documentation
+├── rules/ # Development guidelines
+└── astuces/ # Implementation techniques
```
-## Environment Variables
-
-| Variable | Description |
-|----------|-------------|
-| `OPENROUTER_API_KEY` | OpenRouter API key |
-
-## Documentation
-
-### Rules - Development Guidelines
-
-See [rules/](rules/) for comprehensive guides:
-
-- [Architecture Patterns](rules/02-architecture-patterns.md) - **Mandatory project structure**
-- [LLM Usage Guide](rules/06-llm-usage-guide.md) - **Using litellm**
-- [Best Practices](rules/05-best-practices.md)
-- [Error Handling](rules/08-error-handling.md)
-
-### Tips - Practical Techniques
-
-See [astuces/](astuces/) for techniques:
+## Development Guidelines
-- [Prompt Caching](astuces/01-prompt-caching.md)
-- [Context Management](astuces/03-context-management.md)
-- [Local Testing](astuces/09-local-testing.md)
+For agent developers, see:
+- [rules/](rules/) - Architecture patterns, best practices, anti-patterns
+- [astuces/](astuces/) - Practical techniques (caching, verification, etc.)
+- [AGENTS.md](AGENTS.md) - Comprehensive building guide
## License
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..3700151
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,125 @@
+# BaseAgent Documentation
+
+> **Professional documentation for the BaseAgent autonomous coding assistant**
+
+BaseAgent is a high-performance autonomous agent designed for the [Term Challenge](https://term.challenge). It leverages LLM-driven decision making with advanced context management and cost optimization techniques.
+
+---
+
+## Table of Contents
+
+### Getting Started
+- [Overview](./overview.md) - What is BaseAgent and core design principles
+- [Installation](./installation.md) - Prerequisites and setup instructions
+- [Quick Start](./quickstart.md) - Your first task in 5 minutes
+
+### Core Concepts
+- [Architecture](./architecture.md) - Technical architecture and system design
+- [Configuration](./configuration.md) - All configuration options explained
+- [Usage Guide](./usage.md) - Command-line interface and options
+
+### Reference
+- [Tools Reference](./tools.md) - Available tools and their parameters
+- [Context Management](./context-management.md) - Token management and compaction
+- [Best Practices](./best-practices.md) - Optimal usage patterns
+
+### LLM Providers
+- [Chutes API Integration](./chutes-integration.md) - Using Chutes as your LLM provider
+
+---
+
+## Quick Navigation
+
+| Document | Description |
+|----------|-------------|
+| [Overview](./overview.md) | High-level introduction and design principles |
+| [Installation](./installation.md) | Step-by-step setup guide |
+| [Quick Start](./quickstart.md) | Get running in minutes |
+| [Architecture](./architecture.md) | Technical deep-dive with diagrams |
+| [Configuration](./configuration.md) | Environment variables and settings |
+| [Usage](./usage.md) | CLI commands and examples |
+| [Tools](./tools.md) | Complete tools reference |
+| [Context Management](./context-management.md) | Memory and token optimization |
+| [Best Practices](./best-practices.md) | Tips for optimal performance |
+| [Chutes Integration](./chutes-integration.md) | Chutes API setup and usage |
+
+---
+
+## Architecture at a Glance
+
+```mermaid
+graph TB
+ subgraph User["User Interface"]
+ CLI["CLI (agent.py)"]
+ end
+
+ subgraph Core["Core Engine"]
+ Loop["Agent Loop"]
+ Context["Context Manager"]
+ Cache["Prompt Cache"]
+ end
+
+ subgraph LLM["LLM Layer"]
+ Client["LiteLLM Client"]
+ Provider["Provider (Chutes/OpenRouter)"]
+ end
+
+ subgraph Tools["Tool System"]
+ Registry["Tool Registry"]
+ Shell["shell_command"]
+ Files["read_file / write_file"]
+ Search["grep_files / list_dir"]
+ end
+
+ CLI --> Loop
+ Loop --> Context
+ Loop --> Cache
+ Loop --> Client
+ Client --> Provider
+ Loop --> Registry
+ Registry --> Shell
+ Registry --> Files
+ Registry --> Search
+```
+
+---
+
+## Key Features
+
+- **Fully Autonomous** - No user confirmation required; makes decisions independently
+- **LLM-Driven** - All decisions made by the language model, not hardcoded logic
+- **Prompt Caching** - 90%+ cache hit rate for significant cost reduction
+- **Context Management** - Intelligent pruning and compaction for long tasks
+- **Self-Verification** - Automatic validation before task completion
+- **Multi-Provider** - Supports Chutes AI, OpenRouter, and litellm-compatible providers
+
+---
+
+## Project Structure
+
+```
+baseagent/
+├── agent.py # Entry point
+├── src/
+│ ├── core/
+│ │ ├── loop.py # Main agent loop
+│ │ └── compaction.py # Context management
+│ ├── llm/
+│ │ └── client.py # LLM client (litellm)
+│ ├── config/
+│ │ └── defaults.py # Configuration
+│ ├── tools/ # Tool implementations
+│ ├── prompts/
+│ │ └── system.py # System prompt
+│ └── output/
+│ └── jsonl.py # JSONL event emission
+├── rules/ # Development guidelines
+├── astuces/ # Implementation techniques
+└── docs/ # This documentation
+```
+
+---
+
+## License
+
+MIT License - See [LICENSE](../LICENSE) for details.
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..772b5ee
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,435 @@
+# Technical Architecture
+
+> **Deep dive into BaseAgent's system design, components, and data flow**
+
+## System Overview
+
+BaseAgent follows a modular architecture with clear separation of concerns:
+
+```mermaid
+graph TB
+ subgraph Entry["Entry Layer"]
+ agent["agent.py
CLI Entry Point"]
+ end
+
+ subgraph Core["Core Layer"]
+ loop["loop.py
Agent Loop"]
+ compact["compaction.py
Context Manager"]
+ end
+
+ subgraph LLM["LLM Layer"]
+ client["client.py
LiteLLM Client"]
+ end
+
+ subgraph Config["Configuration"]
+ defaults["defaults.py
Settings"]
+ prompts["system.py
System Prompt"]
+ end
+
+ subgraph Tools["Tool Layer"]
+ registry["registry.py
Tool Registry"]
+ shell["shell.py"]
+ read["read_file.py"]
+ write["write_file.py"]
+ patch["apply_patch.py"]
+ grep["grep_files.py"]
+ list["list_dir.py"]
+ end
+
+ subgraph Output["Output Layer"]
+ jsonl["jsonl.py
Event Emitter"]
+ end
+
+ agent --> loop
+ loop --> compact
+ loop --> client
+ loop --> registry
+ loop --> jsonl
+ client --> defaults
+ loop --> prompts
+ registry --> shell & read & write & patch & grep & list
+
+ style loop fill:#4CAF50,color:#fff
+ style client fill:#2196F3,color:#fff
+ style compact fill:#FF9800,color:#fff
+```
+
+---
+
+## Component Diagram
+
+```mermaid
+classDiagram
+ class AgentContext {
+ +instruction: str
+ +cwd: str
+ +step: int
+ +is_done: bool
+ +history: List
+ +shell(cmd, timeout) ShellResult
+ +done()
+ +log(msg)
+ }
+
+ class LiteLLMClient {
+ +model: str
+ +temperature: float
+ +max_tokens: int
+ +cost_limit: float
+ +chat(messages, tools) LLMResponse
+ +get_stats() Dict
+ }
+
+ class LLMResponse {
+ +text: str
+ +function_calls: List~FunctionCall~
+ +tokens: Dict
+ +has_function_calls() bool
+ }
+
+ class FunctionCall {
+ +id: str
+ +name: str
+ +arguments: Dict
+ }
+
+ class ToolRegistry {
+ +tools: Dict
+ +execute(ctx, name, args) ToolResult
+ +get_tools_for_llm() List
+ }
+
+ class ToolResult {
+ +success: bool
+ +output: str
+ +inject_content: Optional
+ }
+
+ AgentContext --> LiteLLMClient : uses
+ LiteLLMClient --> LLMResponse : returns
+ LLMResponse --> FunctionCall : contains
+ AgentContext --> ToolRegistry : uses
+ ToolRegistry --> ToolResult : returns
+```
+
+---
+
+## Agent Loop Workflow
+
+The heart of BaseAgent is the agent loop in `src/core/loop.py`:
+
+```mermaid
+flowchart TB
+ Start([Start]) --> Init[Initialize Session]
+ Init --> BuildMsg[Build Initial Messages]
+ BuildMsg --> GetState[Get Terminal State]
+
+ GetState --> LoopStart{Iteration < Max?}
+
+ LoopStart -->|Yes| ManageCtx[Manage Context
Prune/Compact if needed]
+ ManageCtx --> ApplyCache[Apply Prompt Caching]
+ ApplyCache --> CallLLM[Call LLM with Tools]
+
+ CallLLM --> HasCalls{Has Tool Calls?}
+
+ HasCalls -->|Yes| ResetPending[Reset pending_completion]
+ ResetPending --> ExecTools[Execute Tool Calls]
+ ExecTools --> AddResults[Add Results to Messages]
+ AddResults --> LoopStart
+
+ HasCalls -->|No| CheckPending{pending_completion?}
+
+ CheckPending -->|No| SetPending[Set pending_completion = true]
+ SetPending --> InjectVerify[Inject Verification Prompt]
+ InjectVerify --> LoopStart
+
+ CheckPending -->|Yes| Complete[Task Complete]
+
+ LoopStart -->|No| Timeout[Max Iterations Reached]
+
+ Complete --> Emit[Emit turn.completed]
+ Timeout --> Emit
+ Emit --> End([End])
+
+ style ManageCtx fill:#FF9800,color:#fff
+ style ApplyCache fill:#9C27B0,color:#fff
+ style CallLLM fill:#2196F3,color:#fff
+ style ExecTools fill:#4CAF50,color:#fff
+ style InjectVerify fill:#E91E63,color:#fff
+```
+
+---
+
+## Data Flow
+
+### Request Flow
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Entry as agent.py
+ participant Loop as loop.py
+ participant Context as compaction.py
+ participant Cache as Prompt Cache
+ participant LLM as LiteLLM Client
+ participant Provider as API Provider
+ participant Tools as Tool Registry
+
+ User->>Entry: --instruction "Create hello.txt"
+ Entry->>Entry: Initialize AgentContext
+ Entry->>Entry: Initialize LiteLLMClient
+ Entry->>Loop: run_agent_loop()
+
+ Loop->>Loop: Build messages [system, user, state]
+
+ rect rgb(255, 240, 220)
+ Note over Loop,Provider: Iteration Loop
+ Loop->>Context: manage_context(messages)
+ Context-->>Loop: Managed messages
+
+ Loop->>Cache: apply_caching(messages)
+ Cache-->>Loop: Cached messages
+
+ Loop->>LLM: chat(messages, tools)
+ LLM->>Provider: API Request
+ Provider-->>LLM: Response
+ LLM-->>Loop: LLMResponse
+
+ alt Has tool_calls
+ Loop->>Tools: execute(ctx, tool_name, args)
+ Tools-->>Loop: ToolResult
+ Loop->>Loop: Append to messages
+ end
+ end
+
+ Loop-->>Entry: Complete
+ Entry-->>User: JSONL output
+```
+
+### Message Structure
+
+Messages accumulate through the session:
+
+```python
+messages = [
+ # 1. System prompt (stable, cached)
+ {"role": "system", "content": SYSTEM_PROMPT},
+
+ # 2. User instruction
+ {"role": "user", "content": "Create hello.txt with 'Hello World'"},
+
+ # 3. Initial state
+ {"role": "user", "content": "Current directory:\n```\n...\n```"},
+
+ # 4. Assistant response with tool calls
+ {
+ "role": "assistant",
+ "content": "Creating the file...",
+ "tool_calls": [
+ {"id": "call_1", "type": "function", "function": {...}}
+ ]
+ },
+
+ # 5. Tool result
+ {"role": "tool", "tool_call_id": "call_1", "content": "File created"},
+
+ # ... continues until completion
+]
+```
+
+---
+
+## Module Descriptions
+
+### `src/core/loop.py` - Agent Loop
+
+The main orchestration module that:
+- Initializes the session and emits JSONL events
+- Manages the iterative Observe→Think→Act cycle
+- Applies prompt caching for cost optimization
+- Handles LLM errors with retry logic
+- Triggers self-verification before completion
+
+### `src/core/compaction.py` - Context Manager
+
+Intelligent context management that:
+- Estimates token usage (4 chars ≈ 1 token)
+- Detects context overflow at 85% of usable window
+- Prunes old tool outputs (protects last 40K tokens)
+- Runs AI compaction when pruning is insufficient
+- Preserves critical information through summarization
+
+### `src/llm/client.py` - LLM Client
+
+LiteLLM-based client that:
+- Supports multiple providers (Chutes, OpenRouter, etc.)
+- Tracks token usage and costs
+- Handles tool/function calling format
+- Enforces cost limits
+- Provides usage statistics
+
+### `src/tools/registry.py` - Tool Registry
+
+Centralized tool management that:
+- Registers all available tools
+- Provides tool specs for LLM
+- Executes tools with proper context
+- Handles tool output truncation
+- Manages image injection for `view_image`
+
+### `src/prompts/system.py` - System Prompt
+
+System prompt configuration that:
+- Defines agent personality and behavior
+- Specifies coding guidelines
+- Includes AGENTS.md support
+- Configures autonomous operation mode
+- Provides environment context
+
+### `src/config/defaults.py` - Configuration
+
+Central configuration containing:
+- Model settings (model name, tokens, temperature)
+- Context management thresholds
+- Tool output limits
+- Prompt caching settings
+- Execution limits
+
+---
+
+## Context Management Pipeline
+
+```mermaid
+flowchart LR
+ subgraph Input
+ Msgs[Messages
~150K tokens]
+ end
+
+ subgraph Detection
+ Est[Estimate Tokens]
+ Check{> 85% of
168K usable?}
+ end
+
+ subgraph Pruning
+ Scan[Scan backwards]
+ Protect[Protect last 40K
tool tokens]
+ Clear[Clear old outputs]
+ end
+
+ subgraph Compaction
+ CheckAgain{Still > 85%?}
+ Summarize[AI Summarization]
+ NewMsgs[Compacted Messages]
+ end
+
+ subgraph Output
+ Result[Managed Messages]
+ end
+
+ Msgs --> Est --> Check
+ Check -->|No| Result
+ Check -->|Yes| Scan --> Protect --> Clear
+ Clear --> CheckAgain
+ CheckAgain -->|No| Result
+ CheckAgain -->|Yes| Summarize --> NewMsgs --> Result
+```
+
+---
+
+## Tool Execution Flow
+
+```mermaid
+flowchart TB
+ subgraph LLM["LLM Response"]
+ Calls["tool_calls: [
{name: 'shell_command', args: {command: 'ls'}},
{name: 'read_file', args: {file_path: 'README.md'}}
]"]
+ end
+
+ subgraph Registry["Tool Registry"]
+ direction TB
+ Lookup[Lookup Tool]
+ Execute[Execute with Context]
+ Truncate[Truncate Output
max 2500 tokens]
+ end
+
+ subgraph Tools["Tool Implementations"]
+ Shell[shell_command]
+ Read[read_file]
+ Write[write_file]
+ Patch[apply_patch]
+ Grep[grep_files]
+ List[list_dir]
+ end
+
+ subgraph Output["Results"]
+ Results["tool results added
to messages"]
+ end
+
+ Calls --> Lookup
+ Lookup --> Execute
+ Execute --> Shell & Read & Write & Patch & Grep & List
+ Shell & Read & Write & Patch & Grep & List --> Truncate
+ Truncate --> Results
+```
+
+---
+
+## JSONL Event Emission
+
+BaseAgent emits structured JSONL events throughout execution:
+
+```mermaid
+sequenceDiagram
+ participant Loop as Agent Loop
+ participant JSONL as Event Emitter
+ participant stdout as Standard Output
+
+ Loop->>JSONL: emit(ThreadStartedEvent)
+ JSONL->>stdout: {"type": "thread.started", ...}
+
+ Loop->>JSONL: emit(TurnStartedEvent)
+ JSONL->>stdout: {"type": "turn.started", ...}
+
+ loop Each Tool Call
+ Loop->>JSONL: emit(ItemStartedEvent)
+ JSONL->>stdout: {"type": "item.started", ...}
+ Loop->>JSONL: emit(ItemCompletedEvent)
+ JSONL->>stdout: {"type": "item.completed", ...}
+ end
+
+ Loop->>JSONL: emit(TurnCompletedEvent)
+ JSONL->>stdout: {"type": "turn.completed", "usage": {...}}
+```
+
+---
+
+## Error Handling Strategy
+
+```mermaid
+flowchart TB
+ Error[Error Occurs] --> Type{Error Type?}
+
+ Type -->|CostLimitExceeded| Abort[Emit TurnFailed
Abort Session]
+
+ Type -->|Authentication| Abort
+
+ Type -->|Rate Limit| Retry{Attempt < 5?}
+ Retry -->|Yes| Wait[Wait 10s × attempt]
+ Wait --> TryAgain[Retry Request]
+ Retry -->|No| Abort
+
+ Type -->|Timeout/504| Retry
+
+ Type -->|Other| Retry
+
+ TryAgain --> Success{Success?}
+ Success -->|Yes| Continue[Continue Loop]
+ Success -->|No| Retry
+```
+
+---
+
+## Next Steps
+
+- [Configuration Reference](./configuration.md) - All settings explained
+- [Tools Reference](./tools.md) - Detailed tool documentation
+- [Context Management](./context-management.md) - Deep dive into memory management
diff --git a/docs/best-practices.md b/docs/best-practices.md
new file mode 100644
index 0000000..7fa098a
--- /dev/null
+++ b/docs/best-practices.md
@@ -0,0 +1,408 @@
+# Best Practices
+
+> **Strategies for optimal performance, cost efficiency, and reliable results**
+
+## Core Principles
+
+BaseAgent follows these fundamental principles:
+
+1. **Explore First** - Always gather context before acting
+2. **Iterate** - Never try to solve everything in one shot
+3. **Verify** - Double-confirm before completing
+4. **Fail Gracefully** - Handle errors and retry
+5. **Stay Focused** - Complete exactly what's asked
+
+---
+
+## Explore-First Pattern
+
+Before making any changes, always understand the context:
+
+```mermaid
+flowchart LR
+ subgraph Bad["❌ Bad Pattern"]
+ B1[Receive Task] --> B2[Start Coding]
+ B2 --> B3[Hit Problems]
+ B3 --> B4[Backtrack]
+ end
+
+ subgraph Good["✅ Good Pattern"]
+ G1[Receive Task] --> G2[Explore Codebase]
+ G2 --> G3[Understand Patterns]
+ G3 --> G4[Plan Approach]
+ G4 --> G5[Implement]
+ end
+```
+
+### Exploration Steps
+
+1. **Read README** - Understand project purpose
+2. **List directory** - See project structure
+3. **Find similar code** - Match existing patterns
+4. **Check tests** - Understand expected behavior
+5. **Review AGENTS.md** - Follow project instructions
+
+---
+
+## Self-Verification
+
+BaseAgent automatically verifies work before completion:
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant Verify as Verification
+ participant LLM as LLM
+
+ Agent->>Agent: No more tool calls
+ Agent->>Verify: Inject verification prompt
+ Verify->>LLM: Re-read instruction
+ LLM->>LLM: List requirements
+ LLM->>LLM: Verify each requirement
+
+ alt All verified
+ LLM-->>Agent: Confirm completion
+ else Something missing
+ LLM-->>Agent: Continue working
+ end
+```
+
+### Verification Checklist
+
+The agent automatically asks:
+- ✅ Did I read the ENTIRE original instruction?
+- ✅ Did I list ALL requirements (explicit and implicit)?
+- ✅ Did I run commands to VERIFY each requirement?
+- ✅ Did I fix any issues found during verification?
+
+---
+
+## Prompt Caching
+
+Achieve **90%+ cache hit rate** for massive cost savings:
+
+```mermaid
+graph TB
+ subgraph Strategy["Caching Strategy"]
+ S1["Cache first 2 system messages"]
+ S2["Cache last 2 non-system messages"]
+ S3["Up to 4 breakpoints total"]
+ end
+
+ subgraph Effect["Effect"]
+ E1["Request 1: Cache miss (create)"]
+ E2["Request 2: Cache HIT (90% saved)"]
+ E3["Request 3: Cache HIT (90% saved)"]
+ E4["Request N: Cache HIT (90% saved)"]
+ end
+
+ S1 --> E1
+ S2 --> E1
+ E1 --> E2 --> E3 --> E4
+
+ style E2 fill:#4CAF50,color:#fff
+ style E3 fill:#4CAF50,color:#fff
+ style E4 fill:#4CAF50,color:#fff
+```
+
+### How It Works
+
+```python
+# Messages structure
+messages = [
+ {"role": "system", "content": "...", "cache_control": {"type": "ephemeral"}}, # ✓ Cached
+ {"role": "user", "content": "original instruction"},
+ {"role": "assistant", "content": "...", "tool_calls": [...]},
+ {"role": "tool", "content": "..."},
+ {"role": "assistant", "content": "...", "cache_control": {"type": "ephemeral"}}, # ✓ Cached
+ {"role": "user", "content": "verification", "cache_control": {"type": "ephemeral"}}, # ✓ Cached
+]
+```
+
+### Cost Impact
+
+| Scenario | Cost per 1M tokens |
+|----------|-------------------|
+| No caching | $3.00 |
+| 90% cache hit | $0.30 |
+| **Savings** | **90%** |
+
+---
+
+## Cost Optimization
+
+### Set Cost Limits
+
+```bash
+export LLM_COST_LIMIT="5.0" # Max $5 per session
+```
+
+### Monitor Usage
+
+Watch the logs for token counts:
+```
+[14:30:17] [loop] Tokens: 50000 input, 45000 cached, 500 output
+```
+
+### Optimize Instructions
+
+```bash
+# ❌ Vague (causes exploration loops)
+python3 agent.py --instruction "Fix the bugs"
+
+# ✅ Specific (direct action)
+python3 agent.py --instruction "Fix the TypeError in src/api/handlers.py:42"
+```
+
+### Use Targeted Tools
+
+```bash
+# ❌ Wasteful
+ls -laR / # Lists entire filesystem
+
+# ✅ Efficient
+list_dir(dir_path="src/", depth=2)
+```
+
+---
+
+## Git Hygiene
+
+BaseAgent follows strict git rules:
+
+### ✅ Allowed
+
+- `git status` - Check current state
+- `git log` - View history
+- `git blame` - Understand code origins
+- `git diff` - Review changes
+- `git add` - Stage changes (when asked)
+- `git commit` - Commit changes (when asked)
+
+### ❌ Forbidden
+
+- `git reset --hard` - Destructive
+- `git checkout --` - Loses changes
+- Reverting changes you didn't make
+- Amending commits without permission
+- Pushing without explicit request
+
+### Safe Practices
+
+```bash
+# Always check state first
+git status
+
+# Review before committing
+git diff
+
+# Stage specific files
+git add src/specific_file.py
+
+# Never force operations
+# ❌ git push --force
+```
+
+---
+
+## Writing Effective Instructions
+
+### Be Specific
+
+```bash
+# ❌ Too vague
+"Fix the code"
+
+# ✅ Specific
+"Fix the NullPointerException in UserService.java:85 when user.email is null"
+```
+
+### Provide Context
+
+```bash
+# ❌ Missing context
+"Add authentication"
+
+# ✅ With context
+"Add JWT authentication to the /api/users endpoint using the existing AuthService"
+```
+
+### Request Verification
+
+```bash
+# ✅ Ask for verification
+"Create a sorting algorithm and verify it works with [5, 2, 8, 1, 9]"
+```
+
+### Break Down Complex Tasks
+
+```bash
+# ❌ Too complex for one instruction
+"Build a complete e-commerce platform"
+
+# ✅ Incremental
+"Create the product catalog data model with name, price, and description fields"
+```
+
+---
+
+## Tool Usage Patterns
+
+### Shell Commands
+
+```python
+# ✅ Use workdir
+{"command": "ls -la", "workdir": "/workspace/src"}
+
+# ❌ Avoid cd chains
+{"command": "cd /workspace && cd src && ls"}
+```
+
+### File Reading
+
+```python
+# ✅ Read specific sections
+{"file_path": "large.py", "offset": 100, "limit": 50}
+
+# ❌ Read entire large files
+{"file_path": "large.py"} # May overwhelm context
+```
+
+### Searching
+
+```python
+# ✅ Use grep_files for discovery
+{"pattern": "def calculate", "include": "*.py", "path": "src/"}
+
+# Then read specific files found
+{"file_path": "src/billing/calculator.py"}
+```
+
+### Editing
+
+```python
+# ✅ Use apply_patch for surgical edits
+{"patch": "*** Update File: src/utils.py\n@@ def old_func:\n- old\n+ new"}
+
+# ✅ Use write_file for new files
+{"file_path": "new_module.py", "content": "..."}
+```
+
+---
+
+## Handling Long Tasks
+
+For complex, multi-step tasks:
+
+### 1. Use update_plan
+
+```python
+{
+ "steps": [
+ {"description": "Analyze existing code", "status": "completed"},
+ {"description": "Design new module", "status": "in_progress"},
+ {"description": "Implement core logic", "status": "pending"},
+ {"description": "Add unit tests", "status": "pending"},
+ {"description": "Update documentation", "status": "pending"}
+ ]
+}
+```
+
+### 2. Monitor Context
+
+Watch for compaction events:
+```
+[compaction] Context overflow detected, managing...
+```
+
+### 3. Save Progress
+
+If context compaction occurs, the summary preserves:
+- Current progress
+- Key decisions
+- Remaining work
+- Modified files
+
+---
+
+## Error Handling
+
+BaseAgent handles errors gracefully:
+
+### Automatic Retry
+
+```mermaid
+flowchart TB
+ Error[Error Occurs] --> Type{Error Type}
+
+ Type -->|Rate Limit| Wait[Wait + Retry]
+ Type -->|Timeout| Wait
+ Type -->|Server Error| Wait
+
+ Type -->|Auth Error| Fail[Abort]
+ Type -->|Cost Limit| Fail
+
+ Wait --> Attempt{Attempt < 5?}
+ Attempt -->|Yes| Retry[Retry Request]
+ Attempt -->|No| Fail
+
+ Retry --> Success{Success?}
+ Success -->|Yes| Continue[Continue]
+ Success -->|No| Attempt
+```
+
+### Recovery Strategies
+
+1. **Try alternatives** - If one approach fails, try another
+2. **Check documentation** - Read AGENTS.md, README.md
+3. **Simplify** - Break complex operations into steps
+4. **Report issues** - Note blockers in final message
+
+---
+
+## Performance Tips
+
+### Reduce Iterations
+
+1. Give specific, complete instructions
+2. Provide necessary context upfront
+3. Avoid vague requirements
+
+### Minimize Token Usage
+
+1. Search before reading entire files
+2. Use targeted directory listings
+3. Keep tool outputs focused
+
+### Maximize Cache Hits
+
+1. Keep system prompt stable
+2. Don't modify early messages
+3. Let the agent handle caching automatically
+
+---
+
+## Checklist
+
+Before running the agent:
+
+- [ ] Clear, specific instruction
+- [ ] Necessary context provided
+- [ ] API key configured
+- [ ] Cost limit set appropriately
+- [ ] Working directory correct
+
+After completion:
+
+- [ ] Verify output matches requirements
+- [ ] Check for any error messages
+- [ ] Review modified files
+- [ ] Run relevant tests
+
+---
+
+## Next Steps
+
+- [Configuration](./configuration.md) - Tune settings
+- [Context Management](./context-management.md) - Memory optimization
+- [Tools Reference](./tools.md) - Detailed tool docs
diff --git a/docs/chutes-integration.md b/docs/chutes-integration.md
new file mode 100644
index 0000000..75b4955
--- /dev/null
+++ b/docs/chutes-integration.md
@@ -0,0 +1,378 @@
+# Chutes API Integration
+
+> **Using Chutes AI as your LLM provider for BaseAgent**
+
+## Overview
+
+[Chutes AI](https://chutes.ai) provides access to advanced language models through a simple API. BaseAgent supports Chutes as a first-class provider, offering access to the **Kimi K2.5-TEE** model with its powerful thinking capabilities.
+
+---
+
+## Chutes API Features
+
+| Feature | Value |
+|---------|-------|
+| **API Base URL** | `https://llm.chutes.ai/v1` |
+| **Default Model** | `moonshotai/Kimi-K2.5-TEE` |
+| **Model Parameters** | 1T total, 32B activated |
+| **Context Window** | 256K tokens |
+| **Thinking Mode** | Enabled by default |
+
+---
+
+## Quick Setup
+
+### Step 1: Get Your API Token
+
+1. Visit [chutes.ai](https://chutes.ai)
+2. Create an account or sign in
+3. Navigate to API settings
+4. Generate an API token
+
+### Step 2: Configure Environment
+
+```bash
+# Required: API token
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
+
+# Optional: Explicitly set provider and model
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+### Step 3: Run BaseAgent
+
+```bash
+python3 agent.py --instruction "Your task description"
+```
+
+---
+
+## Authentication Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent as BaseAgent
+ participant Client as LiteLLM Client
+ participant Chutes as Chutes API
+
+ Agent->>Client: Initialize with CHUTES_API_TOKEN
+ Client->>Client: Configure litellm
+
+ loop Each Request
+ Agent->>Client: chat(messages, tools)
+ Client->>Chutes: POST /v1/chat/completions
+ Note over Client,Chutes: Authorization: Bearer $CHUTES_API_TOKEN
+ Chutes-->>Client: Response with tokens
+ Client-->>Agent: LLMResponse
+ end
+```
+
+---
+
+## Model Details: Kimi K2.5-TEE
+
+The **moonshotai/Kimi-K2.5-TEE** model offers:
+
+### Architecture
+- **Total Parameters**: 1 Trillion (1T)
+- **Activated Parameters**: 32 Billion (32B)
+- **Architecture**: Mixture of Experts (MoE)
+- **Context Length**: 256,000 tokens
+
+### Thinking Mode
+
+Kimi K2.5-TEE supports a "thinking mode" where the model shows its reasoning process:
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Model as Kimi K2.5-TEE
+ participant Response
+
+ User->>Model: Complex task instruction
+
+ rect rgb(230, 240, 255)
+ Note over Model: Thinking Mode Active
+ Model->>Model: Analyze problem
+ Model->>Model: Consider approaches
+ Model->>Model: Evaluate options
+ end
+
+ Model->>Response: Reasoning process...
+ Model->>Response: Final answer/action
+```
+
+### Temperature Settings
+
+| Mode | Temperature | Top-p | Description |
+|------|-------------|-------|-------------|
+| **Thinking** | 1.0 | 0.95 | More exploratory reasoning |
+| **Instant** | 0.6 | 0.95 | Faster, more deterministic |
+
+---
+
+## Configuration Options
+
+### Basic Configuration
+
+```python
+# src/config/defaults.py
+CONFIG = {
+ "model": os.environ.get("LLM_MODEL", "moonshotai/Kimi-K2.5-TEE"),
+ "provider": "chutes",
+ "temperature": 1.0, # For thinking mode
+ "max_tokens": 16384,
+}
+```
+
+### Environment Variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `CHUTES_API_TOKEN` | Yes | - | API token from chutes.ai |
+| `LLM_PROVIDER` | No | `openrouter` | Set to `chutes` |
+| `LLM_MODEL` | No | `moonshotai/Kimi-K2.5-TEE` | Model identifier |
+| `LLM_COST_LIMIT` | No | `10.0` | Max cost in USD |
+
+---
+
+## Thinking Mode Processing
+
+When thinking mode is enabled, responses include `` tags:
+
+```xml
+
+The user wants to create a file with specific content.
+I should:
+1. Check if the file already exists
+2. Create the file with the requested content
+3. Verify the file was created correctly
+
+
+I'll create the file for you now.
+```
+
+BaseAgent can be configured to:
+- **Parse and strip** the thinking tags (show only final answer)
+- **Keep** the thinking content (useful for debugging)
+- **Log** thinking to stderr while showing final answer
+
+### Parsing Example
+
+```python
+import re
+
+def parse_thinking(response_text: str) -> tuple[str, str]:
+ """Extract thinking and final response."""
+ think_pattern = r'(.*?)'
+ match = re.search(think_pattern, response_text, re.DOTALL)
+
+ if match:
+ thinking = match.group(1).strip()
+ final = re.sub(think_pattern, '', response_text, flags=re.DOTALL).strip()
+ return thinking, final
+
+ return "", response_text
+```
+
+---
+
+## API Request Format
+
+Chutes API follows OpenAI-compatible format:
+
+```bash
+curl -X POST https://llm.chutes.ai/v1/chat/completions \
+ -H "Authorization: Bearer $CHUTES_API_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "moonshotai/Kimi-K2.5-TEE",
+ "messages": [
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": "Hello!"}
+ ],
+ "max_tokens": 1024,
+ "temperature": 1.0,
+ "top_p": 0.95
+ }'
+```
+
+---
+
+## Fallback to OpenRouter
+
+If Chutes is unavailable, BaseAgent can fall back to OpenRouter:
+
+```mermaid
+flowchart TB
+ Start[API Request] --> Check{Chutes Available?}
+
+ Check -->|Yes| Chutes[Send to Chutes API]
+ Chutes --> Success{Success?}
+ Success -->|Yes| Done[Return Response]
+ Success -->|No| Retry{Retry Count < 3?}
+
+ Retry -->|Yes| Chutes
+ Retry -->|No| Fallback[Use OpenRouter]
+
+ Check -->|No| Fallback
+ Fallback --> Done
+```
+
+### Configuration for Fallback
+
+```bash
+# Primary: Chutes
+export CHUTES_API_TOKEN="..."
+export LLM_PROVIDER="chutes"
+
+# Fallback: OpenRouter
+export OPENROUTER_API_KEY="..."
+```
+
+### Switching Providers
+
+```bash
+# Switch to OpenRouter
+export LLM_PROVIDER="openrouter"
+export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+
+# Switch back to Chutes
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+---
+
+## Cost Considerations
+
+### Pricing (Approximate)
+
+| Metric | Cost |
+|--------|------|
+| Input tokens | Varies by model |
+| Output tokens | Varies by model |
+| Cached input | Reduced rate |
+
+### Cost Management
+
+```bash
+# Set cost limit
+export LLM_COST_LIMIT="5.0" # Max $5.00 per session
+```
+
+BaseAgent tracks costs and will abort if the limit is exceeded:
+
+```python
+# In src/llm/client.py
+if self._total_cost >= self.cost_limit:
+ raise CostLimitExceeded(
+ f"Cost limit exceeded: ${self._total_cost:.4f}",
+ used=self._total_cost,
+ limit=self.cost_limit,
+ )
+```
+
+---
+
+## Troubleshooting
+
+### Authentication Errors
+
+```
+LLMError: authentication_error
+```
+
+**Solution**: Verify your token is correct and exported:
+
+```bash
+echo $CHUTES_API_TOKEN # Should show your token
+export CHUTES_API_TOKEN="correct-token"
+```
+
+### Rate Limiting
+
+```
+LLMError: rate_limit
+```
+
+**Solution**: BaseAgent automatically retries with exponential backoff. You can also:
+- Wait a few minutes before retrying
+- Reduce request frequency
+- Check your API plan limits
+
+### Model Not Found
+
+```
+LLMError: Model 'xyz' not found
+```
+
+**Solution**: Use the correct model identifier:
+
+```bash
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+### Connection Timeouts
+
+```
+LLMError: timeout
+```
+
+**Solution**: BaseAgent retries automatically. If persistent:
+- Check your internet connection
+- Verify Chutes API status
+- Consider using OpenRouter as fallback
+
+---
+
+## Integration with LiteLLM
+
+BaseAgent uses [LiteLLM](https://docs.litellm.ai/) for provider abstraction:
+
+```python
+# src/llm/client.py
+import litellm
+
+# For Chutes, configure base URL
+litellm.api_base = "https://llm.chutes.ai/v1"
+
+# Make request
+response = litellm.completion(
+ model="moonshotai/Kimi-K2.5-TEE",
+ messages=messages,
+ api_key=os.environ.get("CHUTES_API_TOKEN"),
+)
+```
+
+---
+
+## Best Practices
+
+### For Optimal Performance
+
+1. **Enable thinking mode** for complex reasoning tasks
+2. **Use appropriate temperature** (1.0 for exploration, 0.6 for precision)
+3. **Leverage the 256K context** for large codebases
+4. **Monitor costs** with `LLM_COST_LIMIT`
+
+### For Reliability
+
+1. **Set up fallback** to OpenRouter
+2. **Handle rate limits** gracefully (automatic in BaseAgent)
+3. **Log responses** for debugging complex tasks
+
+### For Cost Efficiency
+
+1. **Enable prompt caching** (reduces costs by 90%)
+2. **Use context management** to avoid token waste
+3. **Set reasonable cost limits** for testing
+
+---
+
+## Next Steps
+
+- [Configuration Reference](./configuration.md) - All settings explained
+- [Best Practices](./best-practices.md) - Optimization tips
+- [Usage Guide](./usage.md) - Command-line options
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 0000000..492f074
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,304 @@
+# Configuration Reference
+
+> **Complete guide to all configuration options in BaseAgent**
+
+## Overview
+
+BaseAgent configuration is centralized in `src/config/defaults.py`. Settings can be customized via environment variables or by modifying the configuration file directly.
+
+---
+
+## Configuration File
+
+The main configuration is stored in the `CONFIG` dictionary:
+
+```python
+# src/config/defaults.py
+CONFIG = {
+ # Model Settings
+ "model": "openrouter/anthropic/claude-sonnet-4-20250514",
+ "provider": "openrouter",
+ "temperature": 0.0,
+ "max_tokens": 16384,
+ "reasoning_effort": "none",
+
+ # Agent Execution
+ "max_iterations": 200,
+ "max_output_tokens": 2500,
+ "shell_timeout": 60,
+
+ # Context Management
+ "model_context_limit": 200_000,
+ "output_token_max": 32_000,
+ "auto_compact_threshold": 0.85,
+ "prune_protect": 40_000,
+ "prune_minimum": 20_000,
+
+ # Prompt Caching
+ "cache_enabled": True,
+
+ # Execution Flags
+ "bypass_approvals": True,
+ "bypass_sandbox": True,
+ "skip_git_check": True,
+ "unified_exec": True,
+ "json_output": True,
+
+ # Completion
+ "require_completion_confirmation": False,
+}
+```
+
+---
+
+## Environment Variables
+
+### LLM Provider Settings
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LLM_MODEL` | `openrouter/anthropic/claude-sonnet-4-20250514` | Model identifier |
+| `LLM_PROVIDER` | `openrouter` | Provider name (`chutes`, `openrouter`, etc.) |
+| `LLM_COST_LIMIT` | `10.0` | Maximum cost in USD before aborting |
+
+### API Keys
+
+| Variable | Provider | Description |
+|----------|----------|-------------|
+| `CHUTES_API_TOKEN` | Chutes AI | Token from chutes.ai |
+| `OPENROUTER_API_KEY` | OpenRouter | API key from openrouter.ai |
+| `ANTHROPIC_API_KEY` | Anthropic | Direct Anthropic API key |
+| `OPENAI_API_KEY` | OpenAI | OpenAI API key |
+
+### Example Setup
+
+```bash
+# For Chutes AI
+export CHUTES_API_TOKEN="your-token"
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+
+# For OpenRouter
+export OPENROUTER_API_KEY="sk-or-v1-..."
+export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+```
+
+---
+
+## Configuration Sections
+
+### Model Settings
+
+```mermaid
+graph LR
+ subgraph Model["Model Configuration"]
+ M1["model
Model identifier"]
+ M2["provider
API provider"]
+ M3["temperature
Response randomness"]
+ M4["max_tokens
Max output tokens"]
+ M5["reasoning_effort
Reasoning depth"]
+ end
+```
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `model` | `str` | `openrouter/anthropic/claude-sonnet-4-20250514` | Full model identifier with provider prefix |
+| `provider` | `str` | `openrouter` | LLM provider name |
+| `temperature` | `float` | `0.0` | Response randomness (0 = deterministic) |
+| `max_tokens` | `int` | `16384` | Maximum tokens in LLM response |
+| `reasoning_effort` | `str` | `none` | Reasoning depth: `none`, `minimal`, `low`, `medium`, `high`, `xhigh` |
+
+### Agent Execution Settings
+
+```mermaid
+graph LR
+ subgraph Execution["Execution Limits"]
+ E1["max_iterations
200 iterations"]
+ E2["max_output_tokens
2500 tokens"]
+ E3["shell_timeout
60 seconds"]
+ end
+```
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `max_iterations` | `int` | `200` | Maximum loop iterations before stopping |
+| `max_output_tokens` | `int` | `2500` | Max tokens for tool output truncation |
+| `shell_timeout` | `int` | `60` | Shell command timeout in seconds |
+
+### Context Management
+
+```mermaid
+graph TB
+ subgraph Context["Context Window Management"]
+ C1["model_context_limit: 200K"]
+ C2["output_token_max: 32K"]
+ C3["Usable: 168K"]
+ C4["auto_compact_threshold: 85%"]
+ C5["Trigger: ~143K"]
+ end
+
+ C1 --> C3
+ C2 --> C3
+ C3 --> C4
+ C4 --> C5
+```
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `model_context_limit` | `int` | `200000` | Total model context window (tokens) |
+| `output_token_max` | `int` | `32000` | Tokens reserved for output |
+| `auto_compact_threshold` | `float` | `0.85` | Trigger compaction at this % of usable context |
+| `prune_protect` | `int` | `40000` | Protect this many tokens of recent tool output |
+| `prune_minimum` | `int` | `20000` | Only prune if recovering at least this many tokens |
+
+### Prompt Caching
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `cache_enabled` | `bool` | `True` | Enable Anthropic prompt caching |
+
+> **Note**: Prompt caching requires minimum token thresholds per breakpoint:
+> - Claude Opus 4.5 on Bedrock: 4096 tokens
+> - Claude Sonnet/other: 1024 tokens
+
+### Execution Flags
+
+| Setting | Type | Default | Description |
+|---------|------|---------|-------------|
+| `bypass_approvals` | `bool` | `True` | Skip user approval prompts |
+| `bypass_sandbox` | `bool` | `True` | Bypass sandbox restrictions |
+| `skip_git_check` | `bool` | `True` | Skip git repository validation |
+| `unified_exec` | `bool` | `True` | Enable unified execution mode |
+| `json_output` | `bool` | `True` | Always emit JSONL output |
+| `require_completion_confirmation` | `bool` | `False` | Require double-confirm before completing |
+
+---
+
+## Provider-Specific Configuration
+
+### Chutes AI
+
+```python
+# Environment
+CHUTES_API_TOKEN="your-token"
+LLM_PROVIDER="chutes"
+LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+
+# Model features
+# - 1T parameters, 32B activated
+# - 256K context window
+# - Thinking mode enabled by default
+# - Temperature: 1.0 (thinking), 0.6 (instant)
+```
+
+### OpenRouter
+
+```python
+# Environment
+OPENROUTER_API_KEY="sk-or-v1-..."
+LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+
+# Requires openrouter/ prefix for litellm
+```
+
+### Direct Anthropic
+
+```python
+# Environment
+ANTHROPIC_API_KEY="sk-ant-..."
+LLM_MODEL="claude-3-5-sonnet-20241022"
+
+# No prefix needed for direct API
+```
+
+---
+
+## Configuration Workflow
+
+```mermaid
+flowchart TB
+ subgraph Load["Configuration Loading"]
+ Env[Environment Variables]
+ File[defaults.py]
+ Merge[Merged Config]
+ end
+
+ subgraph Apply["Configuration Application"]
+ Loop[Agent Loop]
+ LLM[LLM Client]
+ Context[Context Manager]
+ Tools[Tool Registry]
+ end
+
+ Env --> Merge
+ File --> Merge
+ Merge --> Loop
+ Merge --> LLM
+ Merge --> Context
+ Merge --> Tools
+```
+
+---
+
+## Computed Values
+
+Some values are computed from configuration:
+
+```python
+# Usable context window
+usable_context = model_context_limit - output_token_max
+# Default: 200,000 - 32,000 = 168,000 tokens
+
+# Compaction trigger threshold
+compaction_trigger = usable_context * auto_compact_threshold
+# Default: 168,000 * 0.85 = 142,800 tokens
+
+# Token estimation
+chars_per_token = 4 # Heuristic
+tokens = len(text) // 4
+```
+
+---
+
+## Best Practices
+
+### For Cost Optimization
+
+```bash
+# Lower cost limit for testing
+export LLM_COST_LIMIT="1.0"
+
+# Use smaller context for simple tasks
+# (edit defaults.py)
+"model_context_limit": 100_000
+```
+
+### For Long Tasks
+
+```bash
+# Increase iterations
+# (edit defaults.py)
+"max_iterations": 500
+
+# Lower compaction threshold for aggressive memory management
+"auto_compact_threshold": 0.70
+```
+
+### For Debugging
+
+```bash
+# Disable caching to see full API calls
+# (edit defaults.py)
+"cache_enabled": False
+
+# Increase output limits for more context
+"max_output_tokens": 5000
+```
+
+---
+
+## Next Steps
+
+- [Chutes Integration](./chutes-integration.md) - Configure Chutes API
+- [Context Management](./context-management.md) - Understand memory management
+- [Best Practices](./best-practices.md) - Optimization tips
diff --git a/docs/context-management.md b/docs/context-management.md
new file mode 100644
index 0000000..2f26e75
--- /dev/null
+++ b/docs/context-management.md
@@ -0,0 +1,412 @@
+# Context Management
+
+> **How BaseAgent manages memory and prevents token overflow**
+
+## Why Context Management Matters
+
+Large Language Models have finite context windows. Without proper management:
+- "Context too long" errors terminate sessions
+- Critical information gets lost
+- Response quality degrades
+- Costs increase unnecessarily
+
+BaseAgent implements sophisticated context management inspired by OpenCode and Codex.
+
+---
+
+## Context Window Overview
+
+```mermaid
+graph TB
+ subgraph Window["Claude Opus 4.5 Context Window (200K tokens)"]
+ Output["Reserved for Output
32K tokens"]
+ Usable["Usable Context
168K tokens"]
+ end
+
+ subgraph Thresholds["Management Thresholds"]
+ Safe["Safe Zone
< 85% (143K)"]
+ Warning["Warning Zone
85-100%"]
+ Overflow["Overflow
> 168K"]
+ end
+
+ Usable --> Safe
+ Usable --> Warning
+ Usable --> Overflow
+
+ style Safe fill:#4CAF50,color:#fff
+ style Warning fill:#FF9800,color:#fff
+ style Overflow fill:#F44336,color:#fff
+```
+
+### Key Numbers
+
+| Metric | Value | Description |
+|--------|-------|-------------|
+| Total context | 200,000 | Model's full context window |
+| Output reserve | 32,000 | Reserved for LLM response |
+| Usable context | 168,000 | Available for messages |
+| Compaction threshold | 85% | Trigger at 142,800 tokens |
+| Prune protect | 40,000 | Recent tool output to keep |
+| Prune minimum | 20,000 | Minimum savings to prune |
+
+---
+
+## Token Estimation
+
+BaseAgent estimates tokens using a simple heuristic:
+
+```python
+# 1 token ≈ 4 characters
+def estimate_tokens(text: str) -> int:
+ return len(text) // 4
+```
+
+### Message Token Components
+
+```mermaid
+graph LR
+ subgraph Message["Message Token Estimation"]
+ Content["Content
(text / 4)"]
+ Images["Images
(~1000 each)"]
+ ToolCalls["Tool Calls
(name + args)"]
+ Overhead["Role Overhead
(~4 tokens)"]
+ end
+
+ Content --> Total["Total Tokens"]
+ Images --> Total
+ ToolCalls --> Total
+ Overhead --> Total
+```
+
+---
+
+## Context Management Pipeline
+
+```mermaid
+flowchart TB
+ subgraph Input["Every Iteration"]
+ Messages["Current Messages"]
+ end
+
+ subgraph Detection["1. Detection"]
+ Estimate["Estimate Total Tokens"]
+ Check{"Above 85%
Threshold?"}
+ end
+
+ subgraph Pruning["2. Pruning (First Pass)"]
+ Scan["Scan Backwards"]
+ Protect["Protect Last 40K
Tool Output Tokens"]
+ Clear["Clear Old Tool Outputs"]
+ CheckAgain{"Still Above
Threshold?"}
+ end
+
+ subgraph Compaction["3. AI Compaction (Second Pass)"]
+ Summary["Generate Summary
via LLM"]
+ Rebuild["Rebuild Messages:
System + Summary"]
+ end
+
+ subgraph Output["Continue Loop"]
+ Managed["Managed Messages"]
+ end
+
+ Messages --> Estimate --> Check
+ Check -->|No| Managed
+ Check -->|Yes| Scan --> Protect --> Clear --> CheckAgain
+ CheckAgain -->|No| Managed
+ CheckAgain -->|Yes| Summary --> Rebuild --> Managed
+
+ style Pruning fill:#FF9800,color:#fff
+ style Compaction fill:#9C27B0,color:#fff
+```
+
+---
+
+## Stage 1: Tool Output Pruning
+
+The first defense against context overflow is pruning old tool outputs.
+
+### Strategy
+
+1. Scan messages **backwards** (most recent first)
+2. Skip the first 2 user turns (most recent)
+3. Accumulate tool output tokens
+4. After 40K tokens accumulated, mark older outputs for pruning
+5. Only prune if savings exceed 20K tokens
+
+### Implementation
+
+```python
+def prune_old_tool_outputs(messages, protect_last_turns=2):
+ total = 0 # Total tool output tokens seen
+ pruned = 0 # Tokens to be pruned
+ to_prune = []
+ turns = 0
+
+ for i in range(len(messages) - 1, -1, -1):
+ msg = messages[i]
+
+ if msg["role"] == "user":
+ turns += 1
+
+ if turns < protect_last_turns:
+ continue
+
+ if msg["role"] == "tool":
+ content = msg.get("content", "")
+ estimate = len(content) // 4
+ total += estimate
+
+ if total > PRUNE_PROTECT: # 40K
+ pruned += estimate
+ to_prune.append(i)
+
+ if pruned > PRUNE_MINIMUM: # 20K
+ # Replace content with marker
+ for idx in to_prune:
+ messages[idx]["content"] = "[Old tool result content cleared]"
+
+ return messages
+```
+
+### Visual Example
+
+```mermaid
+graph TB
+ subgraph Before["Before Pruning (150K tokens)"]
+ S1["System Prompt
5K tokens"]
+ U1["User Instruction
1K tokens"]
+ A1["Assistant + Tools
10K tokens"]
+ T1["Tool Results (old)
50K tokens"]
+ A2["Assistant + Tools
10K tokens"]
+ T2["Tool Results (old)
40K tokens"]
+ A3["Assistant + Tools
10K tokens"]
+ T3["Tool Results (recent)
24K tokens"]
+ end
+
+ subgraph After["After Pruning (60K tokens)"]
+ S2["System Prompt
5K tokens"]
+ U2["User Instruction
1K tokens"]
+ A4["Assistant + Tools
10K tokens"]
+ T4["[cleared]
~0 tokens"]
+ A5["Assistant + Tools
10K tokens"]
+ T5["[cleared]
~0 tokens"]
+ A6["Assistant + Tools
10K tokens"]
+ T6["Tool Results (protected)
24K tokens"]
+ end
+
+ T1 -.-> T4
+ T2 -.-> T5
+ T3 --> T6
+
+ style T4 fill:#FF9800,color:#fff
+ style T5 fill:#FF9800,color:#fff
+ style T6 fill:#4CAF50,color:#fff
+```
+
+---
+
+## Stage 2: AI Compaction
+
+When pruning isn't enough, BaseAgent uses the LLM to summarize the conversation.
+
+### Compaction Process
+
+```mermaid
+sequenceDiagram
+ participant Loop as Agent Loop
+ participant Compact as Compaction
+ participant LLM as LLM API
+
+ Loop->>Compact: Context still too large
+ Compact->>Compact: Add compaction prompt
+ Compact->>LLM: Request summary
+ LLM-->>Compact: Summary response
+ Compact->>Compact: Build new messages
+ Compact-->>Loop: [System, Summary]
+```
+
+### Compaction Prompt
+
+```python
+COMPACTION_PROMPT = """
+You are performing a CONTEXT CHECKPOINT COMPACTION.
+Create a handoff summary for another LLM that will resume the task.
+
+Include:
+- Current progress and key decisions made
+- Important context, constraints, or user preferences
+- What remains to be done (clear next steps)
+- Any critical data, examples, or references needed to continue
+- Which files were modified and how
+- Any errors encountered and how they were resolved
+
+Be concise, structured, and focused on helping the next LLM
+seamlessly continue the work. Use bullet points and clear sections.
+"""
+```
+
+### Result
+
+The compacted messages are:
+
+```python
+compacted = [
+ {"role": "system", "content": original_system_prompt},
+ {"role": "user", "content": SUMMARY_PREFIX + llm_summary},
+]
+```
+
+### Summary Prefix
+
+```python
+SUMMARY_PREFIX = """
+Another language model started to solve this problem and produced
+a summary of its thinking process. You also have access to the state
+of the tools that were used. Use this to build on the work that has
+already been done and avoid duplicating work.
+
+Here is the summary from the previous context:
+
+"""
+```
+
+---
+
+## Middle-Out Truncation
+
+For individual tool outputs, BaseAgent uses middle-out truncation:
+
+```mermaid
+graph LR
+ subgraph Original["Original Output"]
+ O1["Start
(headers, definitions)"]
+ O2["Middle
(repetitive data)"]
+ O3["End
(results, errors)"]
+ end
+
+ subgraph Truncated["Truncated Output"]
+ T1["Start
(preserved)"]
+ T2["[...truncated...]"]
+ T3["End
(preserved)"]
+ end
+
+ O1 --> T1
+ O2 -.-> T2
+ O3 --> T3
+
+ style O2 fill:#FF9800,color:#fff
+ style T2 fill:#FF9800,color:#fff
+```
+
+### Implementation
+
+```python
+def middle_out_truncate(text: str, max_tokens: int = 2500) -> str:
+ max_chars = max_tokens * 4 # 4 chars per token
+
+ if len(text) <= max_chars:
+ return text
+
+ keep = max_chars // 2 - 50 # Room for marker
+ return f"{text[:keep]}\n\n[...truncated...]\n\n{text[-keep:]}"
+```
+
+### Why Middle-Out?
+
+| Section | Contains | Value |
+|---------|----------|-------|
+| **Start** | Headers, imports, definitions | High |
+| **Middle** | Repetitive data, logs | Low |
+| **End** | Results, errors, summaries | High |
+
+---
+
+## Configuration Options
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `model_context_limit` | 200,000 | Total context window |
+| `output_token_max` | 32,000 | Reserved for output |
+| `auto_compact_threshold` | 0.85 | Trigger threshold |
+| `prune_protect` | 40,000 | Recent tool tokens to keep |
+| `prune_minimum` | 20,000 | Minimum savings to prune |
+| `max_output_tokens` | 2,500 | Per-tool output limit |
+
+### Tuning Guidelines
+
+**For Long Tasks:**
+```python
+"auto_compact_threshold": 0.70, # More aggressive
+"prune_protect": 30_000, # Protect less
+```
+
+**For Complex Tasks (need more context):**
+```python
+"auto_compact_threshold": 0.90, # Less aggressive
+"prune_protect": 60_000, # Protect more
+```
+
+---
+
+## Monitoring Context Usage
+
+BaseAgent logs context status each iteration:
+
+```
+[14:30:16] [compaction] Context: 45000 tokens (26.8% of 168000)
+[14:35:22] [compaction] Context: 125000 tokens (74.4% of 168000)
+[14:38:45] [compaction] Context: 148000 tokens (88.1% of 168000)
+[14:38:45] [compaction] Context overflow detected, managing...
+[14:38:45] [compaction] Prune scan: 95000 total tokens, 55000 prunable
+[14:38:45] [compaction] Pruning 12 tool outputs, recovering ~55000 tokens
+[14:38:46] [compaction] Pruning sufficient: 148000 -> 93000 tokens
+```
+
+---
+
+## Best Practices
+
+### 1. Keep Tool Outputs Focused
+
+```bash
+# ❌ Too much output
+ls -laR / # Lists entire filesystem
+
+# ✅ Targeted
+ls -la /workspace/src/ # Just what's needed
+```
+
+### 2. Use Appropriate Search Patterns
+
+```bash
+# ❌ Too broad
+grep "function" # Matches everything
+
+# ✅ Specific
+grep "def calculate_total" src/billing.py
+```
+
+### 3. Read Sections, Not Entire Files
+
+```json
+// ❌ Entire large file
+{"name": "read_file", "arguments": {"file_path": "huge.py"}}
+
+// ✅ Specific section
+{"name": "read_file", "arguments": {"file_path": "huge.py", "offset": 100, "limit": 50}}
+```
+
+### 4. Monitor Long Sessions
+
+For tasks exceeding 50 iterations, watch for:
+- Repeated compaction events
+- Context oscillating near threshold
+- Loss of important context after compaction
+
+---
+
+## Next Steps
+
+- [Best Practices](./best-practices.md) - Optimization strategies
+- [Configuration](./configuration.md) - Tuning options
+- [Architecture](./architecture.md) - System design
diff --git a/docs/installation.md b/docs/installation.md
new file mode 100644
index 0000000..24d6700
--- /dev/null
+++ b/docs/installation.md
@@ -0,0 +1,249 @@
+# Installation Guide
+
+> **Step-by-step instructions for setting up BaseAgent**
+
+## Prerequisites
+
+Before installing BaseAgent, ensure you have:
+
+| Requirement | Version | Notes |
+|-------------|---------|-------|
+| Python | 3.9+ | Python 3.11+ recommended |
+| pip | Latest | Python package manager |
+| Git | 2.x | For cloning the repository |
+
+### Optional but Recommended
+
+| Tool | Purpose |
+|------|---------|
+| `ripgrep` (`rg`) | Fast file searching (used by `grep_files` tool) |
+| `tree` | Directory visualization |
+
+---
+
+## Installation Methods
+
+### Method 1: Using pyproject.toml (Recommended)
+
+```bash
+# Clone the repository
+git clone https://github.com/your-org/baseagent.git
+cd baseagent
+
+# Install with pip
+pip install .
+```
+
+This installs BaseAgent as a package with all dependencies.
+
+### Method 2: Using requirements.txt
+
+```bash
+# Clone the repository
+git clone https://github.com/your-org/baseagent.git
+cd baseagent
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### Method 3: Development Installation
+
+For development with editable installs:
+
+```bash
+git clone https://github.com/your-org/baseagent.git
+cd baseagent
+
+# Editable install
+pip install -e .
+```
+
+---
+
+## Dependencies
+
+BaseAgent requires these Python packages:
+
+```
+litellm>=1.0.0 # LLM API abstraction
+httpx>=0.24.0 # HTTP client
+pydantic>=2.0.0 # Data validation
+```
+
+These are automatically installed via pip.
+
+---
+
+## Environment Setup
+
+### 1. Choose Your LLM Provider
+
+BaseAgent supports multiple LLM providers. Choose one:
+
+#### Option A: Chutes AI (Recommended)
+
+```bash
+# Set your Chutes API token
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
+
+# Configure provider
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+```
+
+Get your token at [chutes.ai](https://chutes.ai)
+
+#### Option B: OpenRouter
+
+```bash
+# Set your OpenRouter API key
+export OPENROUTER_API_KEY="sk-or-v1-..."
+
+# Model is auto-configured for OpenRouter
+```
+
+Get your key at [openrouter.ai](https://openrouter.ai)
+
+#### Option C: Direct Provider APIs
+
+```bash
+# For Anthropic
+export ANTHROPIC_API_KEY="sk-ant-..."
+
+# For OpenAI
+export OPENAI_API_KEY="sk-..."
+```
+
+### 2. Create a Configuration File (Optional)
+
+Create `.env` in the project root:
+
+```bash
+# .env file
+CHUTES_API_TOKEN=your-token-here
+LLM_PROVIDER=chutes
+LLM_MODEL=moonshotai/Kimi-K2.5-TEE
+LLM_COST_LIMIT=10.0
+```
+
+---
+
+## Verification
+
+### Step 1: Verify Python Installation
+
+```bash
+python3 --version
+# Expected: Python 3.11.x or higher
+```
+
+### Step 2: Verify Dependencies
+
+```bash
+python3 -c "import litellm; print('litellm:', litellm.__version__)"
+python3 -c "import httpx; print('httpx:', httpx.__version__)"
+python3 -c "import pydantic; print('pydantic:', pydantic.__version__)"
+```
+
+### Step 3: Verify BaseAgent Installation
+
+```bash
+python3 -c "from src.core.loop import run_agent_loop; print('BaseAgent: OK')"
+```
+
+### Step 4: Test Run
+
+```bash
+python3 agent.py --instruction "Print 'Hello, BaseAgent!'"
+```
+
+Expected output: JSONL events showing the agent executing your instruction.
+
+---
+
+## Directory Structure After Installation
+
+```
+baseagent/
+├── agent.py # ✓ Entry point
+├── src/
+│ ├── core/
+│ │ ├── loop.py # ✓ Agent loop
+│ │ └── compaction.py # ✓ Context manager
+│ ├── llm/
+│ │ └── client.py # ✓ LLM client
+│ ├── config/
+│ │ └── defaults.py # ✓ Configuration
+│ ├── tools/ # ✓ Tool implementations
+│ ├── prompts/
+│ │ └── system.py # ✓ System prompt
+│ └── output/
+│ └── jsonl.py # ✓ Event emission
+├── requirements.txt # ✓ Dependencies
+├── pyproject.toml # ✓ Package config
+├── docs/ # ✓ Documentation
+├── rules/ # Development guidelines
+└── astuces/ # Implementation techniques
+```
+
+---
+
+## Troubleshooting
+
+### Issue: `ModuleNotFoundError: No module named 'litellm'`
+
+**Solution**: Install dependencies
+
+```bash
+pip install -r requirements.txt
+# or
+pip install litellm httpx pydantic
+```
+
+### Issue: `ImportError: cannot import name 'run_agent_loop'`
+
+**Solution**: Ensure you're in the project root directory
+
+```bash
+cd /path/to/baseagent
+python3 agent.py --instruction "..."
+```
+
+### Issue: API Key Errors
+
+**Solution**: Verify your environment variables are set
+
+```bash
+# Check if variables are set
+echo $CHUTES_API_TOKEN
+echo $OPENROUTER_API_KEY
+
+# Re-export if needed
+export CHUTES_API_TOKEN="your-token"
+```
+
+### Issue: `rg` (ripgrep) Not Found
+
+The `grep_files` tool will fall back to `grep` if `rg` is not available, but ripgrep is much faster.
+
+**Solution**: Install ripgrep
+
+```bash
+# Ubuntu/Debian
+apt-get install ripgrep
+
+# macOS
+brew install ripgrep
+
+# Or via cargo
+cargo install ripgrep
+```
+
+---
+
+## Next Steps
+
+- [Quick Start](./quickstart.md) - Run your first task
+- [Configuration](./configuration.md) - Customize settings
+- [Chutes Integration](./chutes-integration.md) - Set up Chutes API
diff --git a/docs/overview.md b/docs/overview.md
new file mode 100644
index 0000000..c05a533
--- /dev/null
+++ b/docs/overview.md
@@ -0,0 +1,214 @@
+# BaseAgent Overview
+
+> **A high-performance autonomous coding agent built for generalist problem-solving**
+
+## What is BaseAgent?
+
+BaseAgent is an autonomous coding agent designed for the [Term Challenge](https://term.challenge). Unlike traditional scripted automation, BaseAgent uses Large Language Models (LLMs) to reason about tasks and make decisions dynamically.
+
+The agent receives natural language instructions and autonomously:
+- Explores the codebase
+- Plans and executes solutions
+- Validates its own work
+- Handles errors and edge cases
+
+---
+
+## Core Design Principles
+
+### 1. No Hardcoding
+
+BaseAgent follows the **Golden Rule**: all decisions are made by the LLM, not by conditional logic.
+
+```python
+# ❌ FORBIDDEN - Hardcoded task routing
+if "file" in instruction:
+ create_file()
+elif "compile" in instruction:
+ compile_code()
+
+# ✅ REQUIRED - LLM-driven decisions
+response = llm.chat(messages, tools=tools)
+execute(response.tool_calls)
+```
+
+### 2. Single Code Path
+
+Every task, regardless of complexity or domain, flows through the same agent loop:
+
+```mermaid
+graph LR
+ A[Receive Instruction] --> B[Build Context]
+ B --> C[LLM Decides]
+ C --> D[Execute Tools]
+ D --> E{Complete?}
+ E -->|No| C
+ E -->|Yes| F[Verify & Return]
+```
+
+### 3. Iterative Execution
+
+BaseAgent never tries to solve everything in one shot. Instead, it:
+- Observes the current state
+- Thinks about the next step
+- Acts by calling tools
+- Repeats until the task is complete
+
+### 4. Self-Verification
+
+Before declaring a task complete, the agent automatically:
+1. Re-reads the original instruction
+2. Lists all requirements (explicit and implicit)
+3. Verifies each requirement with actual commands
+4. Only completes if all verifications pass
+
+---
+
+## High-Level Architecture
+
+```mermaid
+graph TB
+ subgraph Interface["User Interface"]
+ CLI["python agent.py --instruction '...'"]
+ end
+
+ subgraph Engine["Core Engine"]
+ direction TB
+ Loop["Agent Loop
(src/core/loop.py)"]
+ Context["Context Manager
(src/core/compaction.py)"]
+ Prompt["System Prompt
(src/prompts/system.py)"]
+ end
+
+ subgraph LLM["LLM Layer"]
+ Client["LiteLLM Client
(src/llm/client.py)"]
+ API["Provider API
(Chutes/OpenRouter)"]
+ end
+
+ subgraph Tools["Tool System"]
+ Registry["Tool Registry"]
+ Exec["Execution Engine"]
+ end
+
+ CLI --> Loop
+ Loop --> Context
+ Loop --> Prompt
+ Loop --> Client
+ Client --> API
+ Loop --> Registry
+ Registry --> Exec
+
+ style Loop fill:#4CAF50,color:#fff
+ style Client fill:#2196F3,color:#fff
+```
+
+---
+
+## Key Features
+
+### Autonomous Operation
+
+BaseAgent runs in **fully autonomous mode**:
+- No user confirmations required
+- Makes reasonable decisions when faced with ambiguity
+- Handles errors by trying alternative approaches
+- Never asks questions - just executes
+
+### Prompt Caching
+
+Achieves **90%+ cache hit rate** using Anthropic's prompt caching:
+- System prompt cached for stability
+- Last 2 messages cached to extend prefix
+- Reduces API costs by 90%
+
+### Context Management
+
+Intelligent memory management for long tasks:
+- Token-based overflow detection
+- Tool output pruning (protects recent outputs)
+- AI-powered compaction when needed
+- Middle-out truncation for large outputs
+
+### Comprehensive Tooling
+
+Eight specialized tools for coding tasks:
+
+| Tool | Purpose |
+|------|---------|
+| `shell_command` | Execute shell commands |
+| `read_file` | Read files with line numbers |
+| `write_file` | Create or overwrite files |
+| `apply_patch` | Surgical file modifications |
+| `grep_files` | Fast file content search |
+| `list_dir` | Directory exploration |
+| `view_image` | Image analysis |
+| `update_plan` | Progress tracking |
+
+---
+
+## Workflow Overview
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant CLI as agent.py
+ participant Loop as Agent Loop
+ participant LLM as LLM (Chutes/OpenRouter)
+ participant Tools as Tool Registry
+
+ User->>CLI: python agent.py --instruction "..."
+ CLI->>Loop: Initialize session
+
+ loop Until task complete
+ Loop->>Loop: Manage context (prune/compact)
+ Loop->>Loop: Apply prompt caching
+ Loop->>LLM: Send messages + tools
+ LLM-->>Loop: Response (text + tool_calls)
+
+ alt Has tool calls
+ Loop->>Tools: Execute tool calls
+ Tools-->>Loop: Tool results
+ else No tool calls
+ Loop->>Loop: Self-verification check
+ end
+ end
+
+ Loop-->>CLI: Task complete
+ CLI-->>User: JSONL output
+```
+
+---
+
+## What Makes BaseAgent a "Generalist"?
+
+| Characteristic | Description |
+|----------------|-------------|
+| **Single code path** | Same logic handles ALL tasks |
+| **LLM-driven decisions** | LLM chooses actions, not if-statements |
+| **No task keywords** | Zero references to specific task content |
+| **Iterative execution** | Observe → Think → Act loop |
+
+### The Generalist Test
+
+Ask yourself: *"Would this code behave differently if I changed the task instruction?"*
+
+If **YES** and it's not because of LLM reasoning → it's hardcoding → **FORBIDDEN**
+
+---
+
+## Design Philosophy
+
+BaseAgent is built on these principles:
+
+1. **Explore First** - Always gather context before acting
+2. **Iterate** - Never try to do everything in one shot
+3. **Verify** - Double-confirm before completing
+4. **Fail Gracefully** - Handle errors and retry
+5. **Stay Focused** - Complete the task, nothing more
+
+---
+
+## Next Steps
+
+- [Installation Guide](./installation.md) - Set up BaseAgent
+- [Quick Start](./quickstart.md) - Run your first task
+- [Architecture](./architecture.md) - Deep dive into the system design
diff --git a/docs/quickstart.md b/docs/quickstart.md
new file mode 100644
index 0000000..f8a9326
--- /dev/null
+++ b/docs/quickstart.md
@@ -0,0 +1,242 @@
+# Quick Start Guide
+
+> **Get BaseAgent running in 5 minutes**
+
+## Prerequisites
+
+Before starting, ensure you have:
+- Python 3.9+ installed
+- An LLM API key (Chutes, OpenRouter, or Anthropic)
+- BaseAgent installed (see [Installation](./installation.md))
+
+---
+
+## Step 1: Set Up Your API Key
+
+Choose your provider and set the environment variable:
+
+```bash
+# For Chutes AI (recommended)
+export CHUTES_API_TOKEN="your-token-from-chutes.ai"
+
+# OR for OpenRouter
+export OPENROUTER_API_KEY="sk-or-v1-..."
+```
+
+---
+
+## Step 2: Run Your First Task
+
+Navigate to the BaseAgent directory and run:
+
+```bash
+python3 agent.py --instruction "Create a file called hello.txt with the content 'Hello, World!'"
+```
+
+### Expected Output
+
+You'll see JSONL events as the agent works:
+
+```json
+{"type": "thread.started", "thread_id": "sess_1234567890"}
+{"type": "turn.started"}
+{"type": "item.started", "item": {"type": "command_execution", "command": "write_file"}}
+{"type": "item.completed", "item": {"type": "command_execution", "status": "completed"}}
+{"type": "turn.completed", "usage": {"input_tokens": 5000, "output_tokens": 200}}
+```
+
+And the file `hello.txt` will be created:
+
+```bash
+cat hello.txt
+# Output: Hello, World!
+```
+
+---
+
+## Step 3: Try More Examples
+
+### Example: Explore a Codebase
+
+```bash
+python3 agent.py --instruction "Explore this repository and describe its structure"
+```
+
+### Example: Find and Read Files
+
+```bash
+python3 agent.py --instruction "Find all Python files and show me the main entry point"
+```
+
+### Example: Create a Simple Script
+
+```bash
+python3 agent.py --instruction "Create a Python script that prints the Fibonacci sequence up to 100"
+```
+
+### Example: Modify Existing Code
+
+```bash
+python3 agent.py --instruction "Add a docstring to all functions in src/core/loop.py"
+```
+
+---
+
+## Understanding the Output
+
+BaseAgent emits JSONL (JSON Lines) format for machine-readable output:
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Agent
+ participant stdout as Output
+
+ User->>Agent: --instruction "..."
+ Agent->>stdout: {"type": "thread.started", ...}
+ Agent->>stdout: {"type": "turn.started"}
+
+ loop Tool Execution
+ Agent->>stdout: {"type": "item.started", ...}
+ Agent->>stdout: {"type": "item.completed", ...}
+ end
+
+ Agent->>stdout: {"type": "turn.completed", "usage": {...}}
+```
+
+### Key Event Types
+
+| Event | Description |
+|-------|-------------|
+| `thread.started` | Session begins with unique ID |
+| `turn.started` | Agent begins processing |
+| `item.started` | Tool execution begins |
+| `item.completed` | Tool execution finished |
+| `turn.completed` | Agent finished with usage stats |
+| `turn.failed` | Error occurred |
+
+---
+
+## Quick Command Reference
+
+```bash
+# Basic usage
+python3 agent.py --instruction "Your task description"
+
+# With environment variables inline
+CHUTES_API_TOKEN="..." python3 agent.py --instruction "..."
+
+# Redirect output to file
+python3 agent.py --instruction "..." > output.jsonl 2>&1
+```
+
+---
+
+## Agent Workflow
+
+Here's what happens when you run a task:
+
+```mermaid
+flowchart TB
+ subgraph Input
+ Cmd["python3 agent.py --instruction '...'"]
+ end
+
+ subgraph Init["Initialization"]
+ Parse[Parse Arguments]
+ Config[Load Configuration]
+ LLM[Initialize LLM Client]
+ Tools[Register Tools]
+ end
+
+ subgraph Loop["Agent Loop"]
+ Context[Manage Context]
+ Cache[Apply Caching]
+ Call[Call LLM]
+ Execute[Execute Tools]
+ Verify[Self-Verify]
+ end
+
+ subgraph Output
+ JSONL[Emit JSONL Events]
+ Done[Task Complete]
+ end
+
+ Cmd --> Parse --> Config --> LLM --> Tools
+ Tools --> Context --> Cache --> Call
+ Call --> Execute --> Context
+ Execute --> Verify --> Done
+ Context & Call & Execute --> JSONL
+```
+
+---
+
+## Tips for Effective Instructions
+
+### Be Specific
+
+```bash
+# ❌ Too vague
+python3 agent.py --instruction "Fix the bug"
+
+# ✅ Specific
+python3 agent.py --instruction "Fix the TypeError in src/utils.py line 42 where x is None"
+```
+
+### Provide Context
+
+```bash
+# ❌ Missing context
+python3 agent.py --instruction "Add tests"
+
+# ✅ With context
+python3 agent.py --instruction "Add unit tests for the calculate_total function in src/billing.py"
+```
+
+### Request Verification
+
+```bash
+# ✅ Ask for verification
+python3 agent.py --instruction "Create a Python script for sorting and verify it works with sample data"
+```
+
+---
+
+## Troubleshooting
+
+### Agent Not Finding Files
+
+The agent starts in the current directory. Ensure you're in the right location:
+
+```bash
+pwd # Check current directory
+ls # List files
+cd /path/to/project
+python3 /path/to/baseagent/agent.py --instruction "..."
+```
+
+### API Rate Limits
+
+If you hit rate limits, the agent will automatically retry with exponential backoff. You can also:
+
+```bash
+# Set a cost limit
+export LLM_COST_LIMIT="5.0"
+```
+
+### Long-Running Tasks
+
+For complex tasks, the agent may iterate many times. Monitor progress through the JSONL output:
+
+```bash
+python3 agent.py --instruction "..." 2>&1 | grep "item.completed"
+```
+
+---
+
+## Next Steps
+
+- [Usage Guide](./usage.md) - Detailed command-line options
+- [Configuration](./configuration.md) - Customize behavior
+- [Tools Reference](./tools.md) - Available tools
+- [Best Practices](./best-practices.md) - Optimization tips
diff --git a/docs/tools.md b/docs/tools.md
new file mode 100644
index 0000000..78cd143
--- /dev/null
+++ b/docs/tools.md
@@ -0,0 +1,509 @@
+# Tools Reference
+
+> **Complete documentation for all available tools in BaseAgent**
+
+## Overview
+
+BaseAgent provides eight specialized tools for autonomous task execution. Each tool is designed for a specific purpose and follows consistent patterns for input and output.
+
+---
+
+## Tool Summary
+
+| Tool | Purpose | Key Parameters |
+|------|---------|----------------|
+| `shell_command` | Execute shell commands | `command`, `workdir`, `timeout_ms` |
+| `read_file` | Read file contents | `file_path`, `offset`, `limit` |
+| `write_file` | Create/overwrite files | `file_path`, `content` |
+| `apply_patch` | Surgical file edits | `patch` |
+| `grep_files` | Search file contents | `pattern`, `include`, `path` |
+| `list_dir` | List directory contents | `dir_path`, `depth`, `limit` |
+| `view_image` | Analyze images | `path` |
+| `update_plan` | Track progress | `steps`, `explanation` |
+
+---
+
+## Tool Architecture
+
+```mermaid
+graph TB
+ subgraph Registry["Tool Registry (registry.py)"]
+ Lookup["Tool Lookup"]
+ Execute["Execution Engine"]
+ Truncate["Output Truncation"]
+ end
+
+ subgraph Tools["Tool Implementations"]
+ Shell["shell_command"]
+ Read["read_file"]
+ Write["write_file"]
+ Patch["apply_patch"]
+ Grep["grep_files"]
+ List["list_dir"]
+ Image["view_image"]
+ Plan["update_plan"]
+ end
+
+ subgraph Output["Results"]
+ Success["ToolResult(success=True)"]
+ Failure["ToolResult(success=False)"]
+ end
+
+ Lookup --> Shell & Read & Write & Patch & Grep & List & Image & Plan
+ Shell & Read & Write & Patch & Grep & List & Image & Plan --> Execute
+ Execute --> Truncate
+ Truncate --> Success & Failure
+```
+
+---
+
+## shell_command
+
+Execute shell commands in the terminal.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `command` | string | Yes | - | Shell command to execute |
+| `workdir` | string | No | Current dir | Working directory |
+| `timeout_ms` | number | No | 60000 | Timeout in milliseconds |
+
+### Example Usage
+
+```json
+{
+ "name": "shell_command",
+ "arguments": {
+ "command": "ls -la",
+ "workdir": "/workspace",
+ "timeout_ms": 30000
+ }
+}
+```
+
+### Best Practices
+
+- Always set `workdir` to avoid directory confusion
+- Use `rg` (ripgrep) instead of `grep` for faster searches
+- Set appropriate timeouts for long-running commands
+- Prefer specific commands over `cd && command`
+
+### Output Format
+
+```
+total 40
+drwxr-xr-x 7 root root 4096 Feb 3 13:16 .
+drwxr-xr-x 1 root root 4096 Feb 3 12:00 ..
+-rw-r--r-- 1 root root 5432 Feb 3 13:16 agent.py
+drwxr-xr-x 4 root root 4096 Feb 3 13:16 src
+```
+
+---
+
+## read_file
+
+Read file contents with line numbers.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_path` | string | Yes | - | Path to the file |
+| `offset` | number | No | 1 | Starting line (1-indexed) |
+| `limit` | number | No | 2000 | Maximum lines to return |
+
+### Example Usage
+
+```json
+{
+ "name": "read_file",
+ "arguments": {
+ "file_path": "src/core/loop.py",
+ "offset": 1,
+ "limit": 100
+ }
+}
+```
+
+### Output Format
+
+```
+L1: """
+L2: Main agent loop - the heart of the SuperAgent system.
+L3: """
+L4:
+L5: from __future__ import annotations
+L6: import time
+```
+
+### Best Practices
+
+- Use `offset` and `limit` for large files
+- Prefer `grep_files` to find specific content first
+- Read relevant sections, not entire large files
+
+---
+
+## write_file
+
+Create or overwrite a file.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_path` | string | Yes | - | Path to the file |
+| `content` | string | Yes | - | Content to write |
+
+### Example Usage
+
+```json
+{
+ "name": "write_file",
+ "arguments": {
+ "file_path": "hello.txt",
+ "content": "Hello, World!\n"
+ }
+}
+```
+
+### Best Practices
+
+- Use for new files or complete rewrites
+- Prefer `apply_patch` for surgical edits
+- Parent directories are created automatically
+- Include trailing newlines for proper file endings
+
+---
+
+## apply_patch
+
+Apply surgical file modifications using patch format.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `patch` | string | Yes | - | Patch content |
+
+### Patch Format
+
+```
+*** Begin Patch
+*** Add File: path/to/new/file.py
++line 1
++line 2
+*** Update File: path/to/existing/file.py
+@@ def existing_function():
+- old_line
++ new_line
+*** Delete File: path/to/delete.py
+*** End Patch
+```
+
+### Example Usage
+
+```json
+{
+ "name": "apply_patch",
+ "arguments": {
+ "patch": "*** Begin Patch\n*** Update File: src/utils.py\n@@ def calculate(x):\n- return x\n+ return x * 2\n*** End Patch"
+ }
+}
+```
+
+### Patch Rules
+
+1. Use `@@ context line` to identify location
+2. Prefix new lines with `+`
+3. Prefix removed lines with `-`
+4. Include 3 lines of context before and after changes
+5. File paths must be relative (never absolute)
+
+### Operations
+
+| Operation | Format | Description |
+|-----------|--------|-------------|
+| Add file | `*** Add File: path` | Create new file |
+| Update file | `*** Update File: path` | Modify existing file |
+| Delete file | `*** Delete File: path` | Remove file |
+
+---
+
+## grep_files
+
+Search file contents using patterns.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `pattern` | string | Yes | - | Regex pattern to search |
+| `include` | string | No | - | Glob filter (e.g., `*.py`) |
+| `path` | string | No | Current dir | Search path |
+| `limit` | number | No | 100 | Max files to return |
+
+### Example Usage
+
+```json
+{
+ "name": "grep_files",
+ "arguments": {
+ "pattern": "def.*token",
+ "include": "*.py",
+ "path": "src/",
+ "limit": 50
+ }
+}
+```
+
+### Output Format
+
+```
+src/llm/client.py
+src/core/compaction.py
+src/utils/truncate.py
+```
+
+### Best Practices
+
+- Use ripgrep regex syntax
+- Filter with `include` for faster searches
+- Search specific directories when possible
+- Results sorted by modification time
+
+---
+
+## list_dir
+
+List directory contents with type indicators.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `dir_path` | string | Yes | - | Directory path |
+| `offset` | number | No | 1 | Starting entry (1-indexed) |
+| `limit` | number | No | 50 | Max entries to return |
+| `depth` | number | No | 2 | Max directory depth |
+
+### Example Usage
+
+```json
+{
+ "name": "list_dir",
+ "arguments": {
+ "dir_path": "src/",
+ "depth": 3,
+ "limit": 100
+ }
+}
+```
+
+### Output Format
+
+```
+src/
+ core/
+ loop.py
+ compaction.py
+ llm/
+ client.py
+ tools/
+ shell.py
+ read_file.py
+```
+
+### Type Indicators
+
+| Indicator | Meaning |
+|-----------|---------|
+| `/` | Directory |
+| `@` | Symbolic link |
+| (none) | Regular file |
+
+---
+
+## view_image
+
+Load and analyze an image from the filesystem.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `path` | string | Yes | - | Path to image file |
+
+### Supported Formats
+
+- PNG
+- JPEG
+- GIF
+- WebP
+- BMP
+
+### Example Usage
+
+```json
+{
+ "name": "view_image",
+ "arguments": {
+ "path": "screenshots/error.png"
+ }
+}
+```
+
+### How It Works
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant Tool as view_image
+ participant LLM as LLM API
+
+ Agent->>Tool: view_image(path)
+ Tool->>Tool: Load image file
+ Tool->>Tool: Encode as base64
+ Tool-->>Agent: ToolResult with inject_content
+ Agent->>Agent: Add image to messages
+ Agent->>LLM: Messages with image content
+ LLM-->>Agent: Analysis response
+```
+
+### Best Practices
+
+- Only use for images the user mentioned
+- Don't use if image is already in conversation
+- Large images are automatically resized
+- Count as ~1000 tokens in context
+
+---
+
+## update_plan
+
+Track task progress with a visible plan.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `steps` | array | Yes | - | List of step objects |
+| `explanation` | string | No | - | Why the plan changed |
+
+### Step Object
+
+```json
+{
+ "description": "Create helper functions",
+ "status": "completed"
+}
+```
+
+### Status Values
+
+| Status | Description |
+|--------|-------------|
+| `pending` | Not started |
+| `in_progress` | Currently working |
+| `completed` | Finished |
+
+### Example Usage
+
+```json
+{
+ "name": "update_plan",
+ "arguments": {
+ "steps": [
+ {"description": "Read existing code", "status": "completed"},
+ {"description": "Create helper module", "status": "in_progress"},
+ {"description": "Write unit tests", "status": "pending"},
+ {"description": "Update documentation", "status": "pending"}
+ ],
+ "explanation": "Starting implementation after code review"
+ }
+}
+```
+
+### Best Practices
+
+- Keep descriptions to 5-7 words
+- Mark steps completed as you go
+- Update plan when approach changes
+- Use for complex multi-step tasks
+
+---
+
+## Tool Output Limits
+
+All tool outputs are truncated to prevent context overflow:
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `max_output_tokens` | 2500 | Maximum tokens per tool output |
+| Truncation strategy | Middle-out | Keeps start and end, removes middle |
+
+### Middle-Out Truncation
+
+```mermaid
+graph LR
+ subgraph Original["Original Output (10K tokens)"]
+ Start["First 1250 tokens"]
+ Middle["Middle section
(removed)"]
+ End["Last 1250 tokens"]
+ end
+
+ subgraph Truncated["Truncated Output (2500 tokens)"]
+ TStart["First 1250 tokens"]
+ Marker["[...truncated...]"]
+ TEnd["Last 1250 tokens"]
+ end
+
+ Start --> TStart
+ End --> TEnd
+```
+
+**Why middle-out?**
+- Start contains headers, definitions
+- End contains results, errors
+- Middle is often repetitive
+
+---
+
+## Tool Execution Flow
+
+```mermaid
+flowchart TB
+ subgraph Request["LLM Request"]
+ ToolCall["tool_call: {name, arguments}"]
+ end
+
+ subgraph Registry["Tool Registry"]
+ Lookup["Lookup Tool"]
+ Validate["Validate Arguments"]
+ Execute["Execute Tool"]
+ end
+
+ subgraph Processing["Post-Processing"]
+ Truncate["Truncate Output"]
+ Format["Format Result"]
+ end
+
+ subgraph Response["Tool Result"]
+ Success["success: true/false"]
+ Output["output: string"]
+ Inject["inject_content (images)"]
+ end
+
+ ToolCall --> Lookup --> Validate --> Execute
+ Execute --> Truncate --> Format
+ Format --> Success & Output & Inject
+```
+
+---
+
+## Next Steps
+
+- [Usage Guide](./usage.md) - How to use the agent
+- [Context Management](./context-management.md) - Memory optimization
+- [Best Practices](./best-practices.md) - Effective tool usage
diff --git a/docs/usage.md b/docs/usage.md
new file mode 100644
index 0000000..d234c54
--- /dev/null
+++ b/docs/usage.md
@@ -0,0 +1,341 @@
+# Agent Usage Guide
+
+> **Complete guide to running BaseAgent and interpreting its output**
+
+## Command-Line Interface
+
+### Basic Syntax
+
+```bash
+python3 agent.py --instruction "Your task description"
+```
+
+### Required Arguments
+
+| Argument | Type | Description |
+|----------|------|-------------|
+| `--instruction` | string | The task for the agent to complete |
+
+---
+
+## Running the Agent
+
+### Simple Tasks
+
+```bash
+# Create a file
+python3 agent.py --instruction "Create a file called hello.txt with 'Hello, World!'"
+
+# Read and explain code
+python3 agent.py --instruction "Read src/core/loop.py and explain what it does"
+
+# Find files
+python3 agent.py --instruction "Find all Python files that contain 'import json'"
+```
+
+### Complex Tasks
+
+```bash
+# Multi-step task
+python3 agent.py --instruction "Create a Python module in src/utils/helpers.py with functions for string manipulation, then write tests for it"
+
+# Code modification
+python3 agent.py --instruction "Add error handling to all functions in src/api/client.py that make HTTP requests"
+
+# Investigation task
+python3 agent.py --instruction "Find the bug causing the TypeError in the test output and fix it"
+```
+
+---
+
+## Environment Variables
+
+Configure the agent's behavior with environment variables:
+
+```bash
+# LLM Provider (Chutes)
+export CHUTES_API_TOKEN="your-token"
+export LLM_PROVIDER="chutes"
+export LLM_MODEL="moonshotai/Kimi-K2.5-TEE"
+
+# LLM Provider (OpenRouter)
+export OPENROUTER_API_KEY="sk-or-v1-..."
+export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514"
+
+# Cost management
+export LLM_COST_LIMIT="10.0"
+
+# Run with inline variables
+LLM_COST_LIMIT="5.0" python3 agent.py --instruction "..."
+```
+
+---
+
+## Output Format
+
+BaseAgent emits JSONL (JSON Lines) events to stdout:
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant stdout as Standard Output
+
+ Agent->>stdout: {"type": "thread.started", "thread_id": "sess_..."}
+ Agent->>stdout: {"type": "turn.started"}
+
+ loop Tool Execution
+ Agent->>stdout: {"type": "item.started", "item": {...}}
+ Agent->>stdout: {"type": "item.completed", "item": {...}}
+ end
+
+ Agent->>stdout: {"type": "turn.completed", "usage": {...}}
+```
+
+### Event Types
+
+| Event | Description |
+|-------|-------------|
+| `thread.started` | Session begins, includes unique thread ID |
+| `turn.started` | Agent begins processing the instruction |
+| `item.started` | A tool call is starting |
+| `item.completed` | A tool call has completed |
+| `turn.completed` | Agent finished, includes token usage |
+| `turn.failed` | An error occurred |
+
+### Example Output
+
+```json
+{"type": "thread.started", "thread_id": "sess_1706890123456"}
+{"type": "turn.started"}
+{"type": "item.started", "item": {"type": "command_execution", "id": "1", "command": "shell_command({command: 'ls -la'})", "status": "in_progress"}}
+{"type": "item.completed", "item": {"type": "command_execution", "id": "1", "command": "shell_command", "status": "completed", "aggregated_output": "total 40\ndrwxr-xr-x...", "exit_code": 0}}
+{"type": "item.completed", "item": {"type": "agent_message", "id": "2", "content": "I found the files. Now creating hello.txt..."}}
+{"type": "item.started", "item": {"type": "command_execution", "id": "3", "command": "write_file({file_path: 'hello.txt', content: 'Hello, World!'})", "status": "in_progress"}}
+{"type": "item.completed", "item": {"type": "command_execution", "id": "3", "command": "write_file", "status": "completed", "exit_code": 0}}
+{"type": "turn.completed", "usage": {"input_tokens": 5432, "cached_input_tokens": 4890, "output_tokens": 256}}
+```
+
+---
+
+## Logging Output
+
+Agent logs go to stderr:
+
+```
+[14:30:15] [superagent] ============================================================
+[14:30:15] [superagent] SuperAgent Starting (SDK 3.0 - litellm)
+[14:30:15] [superagent] ============================================================
+[14:30:15] [superagent] Model: openrouter/anthropic/claude-sonnet-4-20250514
+[14:30:15] [superagent] Instruction: Create hello.txt with 'Hello World'...
+[14:30:15] [loop] Getting initial state...
+[14:30:16] [loop] Iteration 1/200
+[14:30:16] [compaction] Context: 5432 tokens (3.2% of 168000)
+[14:30:16] [loop] Prompt caching: 1 system + 2 final messages marked (3 breakpoints)
+[14:30:17] [loop] Executing tool: write_file
+[14:30:17] [loop] Iteration 2/200
+[14:30:18] [loop] No tool calls in response
+[14:30:18] [loop] Requesting self-verification before completion
+```
+
+### Separating Output Streams
+
+```bash
+# Send JSONL to file, logs to terminal
+python3 agent.py --instruction "..." > output.jsonl
+
+# Send logs to file, JSONL to terminal
+python3 agent.py --instruction "..." 2> agent.log
+
+# Both to separate files
+python3 agent.py --instruction "..." > output.jsonl 2> agent.log
+```
+
+---
+
+## Processing Output
+
+### Parse JSONL with jq
+
+```bash
+# Get all completed items
+python3 agent.py --instruction "..." | jq 'select(.type == "item.completed")'
+
+# Get final usage stats
+python3 agent.py --instruction "..." | jq 'select(.type == "turn.completed") | .usage'
+
+# Get all agent messages
+python3 agent.py --instruction "..." | jq 'select(.item.type == "agent_message") | .item.content'
+```
+
+### Parse with Python
+
+```python
+import json
+import subprocess
+
+# Run agent and capture output
+result = subprocess.run(
+ ["python3", "agent.py", "--instruction", "Your task"],
+ capture_output=True,
+ text=True
+)
+
+# Parse JSONL output
+events = [json.loads(line) for line in result.stdout.strip().split('\n') if line]
+
+# Find usage stats
+for event in events:
+ if event.get("type") == "turn.completed":
+ print(f"Input tokens: {event['usage']['input_tokens']}")
+ print(f"Output tokens: {event['usage']['output_tokens']}")
+```
+
+---
+
+## Agent Workflow
+
+```mermaid
+flowchart TB
+ subgraph Input["Input Phase"]
+ Cmd["python3 agent.py --instruction '...'"]
+ Parse["Parse Arguments"]
+ Init["Initialize Components"]
+ end
+
+ subgraph Explore["Exploration Phase"]
+ State["Get Current State"]
+ Context["Build Initial Context"]
+ end
+
+ subgraph Execute["Execution Phase"]
+ Loop["Agent Loop"]
+ Tools["Execute Tools"]
+ Verify["Self-Verification"]
+ end
+
+ subgraph Output["Output Phase"]
+ JSONL["Emit JSONL Events"]
+ Stats["Report Statistics"]
+ end
+
+ Cmd --> Parse --> Init
+ Init --> State --> Context
+ Context --> Loop
+ Loop --> Tools --> Loop
+ Loop --> Verify
+ Verify --> Stats
+ Loop --> JSONL
+```
+
+---
+
+## Example Tasks
+
+### File Operations
+
+```bash
+# Create a file
+python3 agent.py --instruction "Create config.yaml with database settings for PostgreSQL"
+
+# Read and summarize
+python3 agent.py --instruction "Read README.md and create a one-paragraph summary"
+
+# Modify a file
+python3 agent.py --instruction "Add a new function to src/utils.py that validates email addresses"
+```
+
+### Code Analysis
+
+```bash
+# Explain code
+python3 agent.py --instruction "Explain how the authentication system works in src/auth/"
+
+# Find patterns
+python3 agent.py --instruction "Find all API endpoints and list them with their HTTP methods"
+
+# Review code
+python3 agent.py --instruction "Review src/api/handlers.py for potential security issues"
+```
+
+### Debugging
+
+```bash
+# Investigate error
+python3 agent.py --instruction "Find why 'test_user_creation' is failing and fix it"
+
+# Trace behavior
+python3 agent.py --instruction "Trace the data flow from user input to database in the signup process"
+```
+
+### Project Tasks
+
+```bash
+# Setup
+python3 agent.py --instruction "Create a Python project structure with src/, tests/, and setup.py"
+
+# Add feature
+python3 agent.py --instruction "Add logging to all functions in src/core/ using Python's logging module"
+
+# Refactor
+python3 agent.py --instruction "Refactor src/utils.py to follow the single responsibility principle"
+```
+
+---
+
+## Session Management
+
+Each agent run creates a new session with a unique ID:
+
+```json
+{"type": "thread.started", "thread_id": "sess_1706890123456"}
+```
+
+### Session Lifecycle
+
+```mermaid
+stateDiagram-v2
+ [*] --> Initializing: python3 agent.py
+ Initializing --> Running: thread.started
+ Running --> Iterating: turn.started
+ Iterating --> Executing: item.started
+ Executing --> Iterating: item.completed
+ Iterating --> Verifying: No tool calls
+ Verifying --> Iterating: Needs more work
+ Verifying --> Complete: Verified
+ Iterating --> Failed: Error
+ Complete --> [*]: turn.completed
+ Failed --> [*]: turn.failed
+```
+
+---
+
+## Performance Tips
+
+### Optimize Token Usage
+
+```bash
+# Set lower cost limit for testing
+export LLM_COST_LIMIT="2.0"
+```
+
+### Monitor Progress
+
+```bash
+# Watch tool executions in real-time
+python3 agent.py --instruction "..." 2>&1 | grep -E "Executing tool|Iteration"
+```
+
+### Debug Issues
+
+```bash
+# Full verbose output
+python3 agent.py --instruction "..." 2>&1 | tee agent_debug.log
+```
+
+---
+
+## Next Steps
+
+- [Tools Reference](./tools.md) - Available tools and their parameters
+- [Configuration](./configuration.md) - Customize agent behavior
+- [Best Practices](./best-practices.md) - Tips for effective usage