diff --git a/README.md b/README.md index 3b9cf99..183091e 100644 --- a/README.md +++ b/README.md @@ -1,167 +1,130 @@ # BaseAgent - SDK 3.0 -High-performance autonomous agent for [Term Challenge](https://term.challenge). **Does NOT use term_sdk** - fully autonomous with litellm. +High-performance autonomous agent for [Term Challenge](https://term.challenge). Supports multiple LLM providers with **Chutes API** (Kimi K2.5-TEE) as the default. -## Installation +## Quick Start ```bash -# Via pyproject.toml -pip install . - -# Via requirements.txt +# 1. Install dependencies pip install -r requirements.txt -``` -## Usage +# 2. Configure Chutes API (default provider) +export CHUTES_API_TOKEN="your-token-from-chutes.ai" -```bash -python agent.py --instruction "Your task here..." +# 3. Run the agent +python3 agent.py --instruction "Your task description here..." ``` -The agent receives the instruction via `--instruction` and executes the task autonomously. - -## Mandatory Architecture - -> **IMPORTANT**: Agents MUST follow these rules to work correctly. +### Alternative: OpenRouter -### 1. Project Structure (MANDATORY) - -Agents **MUST** be structured projects, NOT single files: - -``` -my-agent/ -├── agent.py # Entry point with --instruction -├── src/ # Modules -│ ├── core/ -│ │ ├── loop.py # Main loop -│ │ └── compaction.py # Context management (MANDATORY) -│ ├── llm/ -│ │ └── client.py # LLM client (litellm) -│ └── tools/ -│ └── ... # Available tools -├── requirements.txt # Dependencies -└── pyproject.toml # Project config +```bash +export LLM_PROVIDER="openrouter" +export OPENROUTER_API_KEY="your-openrouter-key" +python3 agent.py --instruction "Your task description here..." ``` -### 2. Session Management (MANDATORY) - -Agents **MUST** maintain complete conversation history: - -```python -messages = [ - {"role": "system", "content": system_prompt}, - {"role": "user", "content": instruction}, -] +## Documentation -# Add each exchange -messages.append({"role": "assistant", "content": response}) -messages.append({"role": "tool", "tool_call_id": id, "content": result}) +📚 **Full documentation available in [docs/](docs/)** + +### Getting Started +- [Overview](docs/overview.md) - What is BaseAgent +- [Installation](docs/installation.md) - Setup instructions +- [Quick Start](docs/quickstart.md) - First task in 5 minutes + +### Core Concepts +- [Architecture](docs/architecture.md) - Technical deep-dive with diagrams +- [Configuration](docs/configuration.md) - All settings explained +- [Usage Guide](docs/usage.md) - CLI commands and examples + +### Reference +- [Tools Reference](docs/tools.md) - Available tools +- [Context Management](docs/context-management.md) - Token optimization +- [Best Practices](docs/best-practices.md) - Performance tips + +### LLM Providers +- [Chutes Integration](docs/chutes-integration.md) - **Default provider setup** + +## Architecture Overview + +```mermaid +graph TB + subgraph User + CLI["python3 agent.py --instruction"] + end + + subgraph Core + Loop["Agent Loop"] + Context["Context Manager"] + end + + subgraph LLM + Chutes["Chutes API (Kimi K2.5)"] + OpenRouter["OpenRouter (fallback)"] + end + + subgraph Tools + Shell["shell_command"] + Files["read/write_file"] + Search["grep_files"] + end + + CLI --> Loop + Loop --> Context + Loop -->|default| Chutes + Loop -->|fallback| OpenRouter + Loop --> Tools ``` -### 3. Context Compaction (MANDATORY) - -Compaction is **CRITICAL** for: -- Avoiding "context too long" errors -- Preserving critical information -- Enabling complex multi-step tasks -- Improving response coherence +## Key Features -```python -# Recommended threshold: 85% of context window -AUTO_COMPACT_THRESHOLD = 0.85 +| Feature | Description | +|---------|-------------| +| **Fully Autonomous** | No user confirmation needed | +| **LLM-Driven** | All decisions made by the language model | +| **Chutes API** | Default: Kimi K2.5-TEE (256K context, thinking mode) | +| **Prompt Caching** | 90%+ cache hit rate | +| **Context Management** | Intelligent pruning and compaction | +| **Self-Verification** | Automatic validation before completion | -# 2-step strategy: -# 1. Pruning: Remove old tool outputs -# 2. AI Compaction: Summarize conversation if pruning insufficient -``` - -## Features +## Environment Variables -### LLM Client (litellm) +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `CHUTES_API_TOKEN` | Yes* | - | Chutes API token | +| `LLM_PROVIDER` | No | `chutes` | `chutes` or `openrouter` | +| `LLM_MODEL` | No | `moonshotai/Kimi-K2.5-TEE` | Model identifier | +| `LLM_COST_LIMIT` | No | `10.0` | Max cost in USD | +| `OPENROUTER_API_KEY` | For OpenRouter | - | OpenRouter API key | -```python -from src.llm.client import LiteLLMClient +*\*Required for default Chutes provider* -llm = LiteLLMClient( - model="openrouter/anthropic/claude-opus-4.5", - temperature=0.0, - max_tokens=16384, -) +## Project Structure -response = llm.chat(messages, tools=tool_specs) ``` - -### Prompt Caching - -Caches system and recent messages to reduce costs: -- Cache hit rate: **90%+** on long conversations -- Significant API cost reduction - -### Self-Verification - -Before completing, the agent automatically: -1. Re-reads the original instruction -2. Verifies each requirement -3. Only confirms completion if everything is validated - -### Context Management - -- **Token-based overflow detection** (not message count) -- **Tool output pruning** (removes old outputs) -- **AI compaction** (summarizes if needed) -- **Middle-out truncation** for large outputs - -## Available Tools - -| Tool | Description | -|------|-------------| -| `shell_command` | Execute shell commands | -| `read_file` | Read files with pagination | -| `write_file` | Create/overwrite files | -| `apply_patch` | Apply patches | -| `grep_files` | Search with ripgrep | -| `list_dir` | List directories | -| `view_image` | Analyze images | - -## Configuration - -See `src/config/defaults.py`: - -```python -CONFIG = { - "model": "openrouter/anthropic/claude-opus-4.5", - "max_tokens": 16384, - "max_iterations": 200, - "auto_compact_threshold": 0.85, - "prune_protect": 40_000, - "cache_enabled": True, -} +baseagent/ +├── agent.py # Entry point +├── src/ +│ ├── core/ +│ │ ├── loop.py # Main agent loop +│ │ └── compaction.py # Context management +│ ├── llm/ +│ │ └── client.py # LLM client +│ ├── config/ +│ │ └── defaults.py # Configuration +│ ├── tools/ # Tool implementations +│ └── prompts/ # System prompt +├── docs/ # 📚 Full documentation +├── rules/ # Development guidelines +└── astuces/ # Implementation techniques ``` -## Environment Variables - -| Variable | Description | -|----------|-------------| -| `OPENROUTER_API_KEY` | OpenRouter API key | - -## Documentation - -### Rules - Development Guidelines - -See [rules/](rules/) for comprehensive guides: - -- [Architecture Patterns](rules/02-architecture-patterns.md) - **Mandatory project structure** -- [LLM Usage Guide](rules/06-llm-usage-guide.md) - **Using litellm** -- [Best Practices](rules/05-best-practices.md) -- [Error Handling](rules/08-error-handling.md) - -### Tips - Practical Techniques - -See [astuces/](astuces/) for techniques: +## Development Guidelines -- [Prompt Caching](astuces/01-prompt-caching.md) -- [Context Management](astuces/03-context-management.md) -- [Local Testing](astuces/09-local-testing.md) +For agent developers, see: +- [rules/](rules/) - Architecture patterns, best practices, anti-patterns +- [astuces/](astuces/) - Practical techniques (caching, verification, etc.) +- [AGENTS.md](AGENTS.md) - Comprehensive building guide ## License diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..3700151 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,125 @@ +# BaseAgent Documentation + +> **Professional documentation for the BaseAgent autonomous coding assistant** + +BaseAgent is a high-performance autonomous agent designed for the [Term Challenge](https://term.challenge). It leverages LLM-driven decision making with advanced context management and cost optimization techniques. + +--- + +## Table of Contents + +### Getting Started +- [Overview](./overview.md) - What is BaseAgent and core design principles +- [Installation](./installation.md) - Prerequisites and setup instructions +- [Quick Start](./quickstart.md) - Your first task in 5 minutes + +### Core Concepts +- [Architecture](./architecture.md) - Technical architecture and system design +- [Configuration](./configuration.md) - All configuration options explained +- [Usage Guide](./usage.md) - Command-line interface and options + +### Reference +- [Tools Reference](./tools.md) - Available tools and their parameters +- [Context Management](./context-management.md) - Token management and compaction +- [Best Practices](./best-practices.md) - Optimal usage patterns + +### LLM Providers +- [Chutes API Integration](./chutes-integration.md) - Using Chutes as your LLM provider + +--- + +## Quick Navigation + +| Document | Description | +|----------|-------------| +| [Overview](./overview.md) | High-level introduction and design principles | +| [Installation](./installation.md) | Step-by-step setup guide | +| [Quick Start](./quickstart.md) | Get running in minutes | +| [Architecture](./architecture.md) | Technical deep-dive with diagrams | +| [Configuration](./configuration.md) | Environment variables and settings | +| [Usage](./usage.md) | CLI commands and examples | +| [Tools](./tools.md) | Complete tools reference | +| [Context Management](./context-management.md) | Memory and token optimization | +| [Best Practices](./best-practices.md) | Tips for optimal performance | +| [Chutes Integration](./chutes-integration.md) | Chutes API setup and usage | + +--- + +## Architecture at a Glance + +```mermaid +graph TB + subgraph User["User Interface"] + CLI["CLI (agent.py)"] + end + + subgraph Core["Core Engine"] + Loop["Agent Loop"] + Context["Context Manager"] + Cache["Prompt Cache"] + end + + subgraph LLM["LLM Layer"] + Client["LiteLLM Client"] + Provider["Provider (Chutes/OpenRouter)"] + end + + subgraph Tools["Tool System"] + Registry["Tool Registry"] + Shell["shell_command"] + Files["read_file / write_file"] + Search["grep_files / list_dir"] + end + + CLI --> Loop + Loop --> Context + Loop --> Cache + Loop --> Client + Client --> Provider + Loop --> Registry + Registry --> Shell + Registry --> Files + Registry --> Search +``` + +--- + +## Key Features + +- **Fully Autonomous** - No user confirmation required; makes decisions independently +- **LLM-Driven** - All decisions made by the language model, not hardcoded logic +- **Prompt Caching** - 90%+ cache hit rate for significant cost reduction +- **Context Management** - Intelligent pruning and compaction for long tasks +- **Self-Verification** - Automatic validation before task completion +- **Multi-Provider** - Supports Chutes AI, OpenRouter, and litellm-compatible providers + +--- + +## Project Structure + +``` +baseagent/ +├── agent.py # Entry point +├── src/ +│ ├── core/ +│ │ ├── loop.py # Main agent loop +│ │ └── compaction.py # Context management +│ ├── llm/ +│ │ └── client.py # LLM client (litellm) +│ ├── config/ +│ │ └── defaults.py # Configuration +│ ├── tools/ # Tool implementations +│ ├── prompts/ +│ │ └── system.py # System prompt +│ └── output/ +│ └── jsonl.py # JSONL event emission +├── rules/ # Development guidelines +├── astuces/ # Implementation techniques +└── docs/ # This documentation +``` + +--- + +## License + +MIT License - See [LICENSE](../LICENSE) for details. diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..772b5ee --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,435 @@ +# Technical Architecture + +> **Deep dive into BaseAgent's system design, components, and data flow** + +## System Overview + +BaseAgent follows a modular architecture with clear separation of concerns: + +```mermaid +graph TB + subgraph Entry["Entry Layer"] + agent["agent.py
CLI Entry Point"] + end + + subgraph Core["Core Layer"] + loop["loop.py
Agent Loop"] + compact["compaction.py
Context Manager"] + end + + subgraph LLM["LLM Layer"] + client["client.py
LiteLLM Client"] + end + + subgraph Config["Configuration"] + defaults["defaults.py
Settings"] + prompts["system.py
System Prompt"] + end + + subgraph Tools["Tool Layer"] + registry["registry.py
Tool Registry"] + shell["shell.py"] + read["read_file.py"] + write["write_file.py"] + patch["apply_patch.py"] + grep["grep_files.py"] + list["list_dir.py"] + end + + subgraph Output["Output Layer"] + jsonl["jsonl.py
Event Emitter"] + end + + agent --> loop + loop --> compact + loop --> client + loop --> registry + loop --> jsonl + client --> defaults + loop --> prompts + registry --> shell & read & write & patch & grep & list + + style loop fill:#4CAF50,color:#fff + style client fill:#2196F3,color:#fff + style compact fill:#FF9800,color:#fff +``` + +--- + +## Component Diagram + +```mermaid +classDiagram + class AgentContext { + +instruction: str + +cwd: str + +step: int + +is_done: bool + +history: List + +shell(cmd, timeout) ShellResult + +done() + +log(msg) + } + + class LiteLLMClient { + +model: str + +temperature: float + +max_tokens: int + +cost_limit: float + +chat(messages, tools) LLMResponse + +get_stats() Dict + } + + class LLMResponse { + +text: str + +function_calls: List~FunctionCall~ + +tokens: Dict + +has_function_calls() bool + } + + class FunctionCall { + +id: str + +name: str + +arguments: Dict + } + + class ToolRegistry { + +tools: Dict + +execute(ctx, name, args) ToolResult + +get_tools_for_llm() List + } + + class ToolResult { + +success: bool + +output: str + +inject_content: Optional + } + + AgentContext --> LiteLLMClient : uses + LiteLLMClient --> LLMResponse : returns + LLMResponse --> FunctionCall : contains + AgentContext --> ToolRegistry : uses + ToolRegistry --> ToolResult : returns +``` + +--- + +## Agent Loop Workflow + +The heart of BaseAgent is the agent loop in `src/core/loop.py`: + +```mermaid +flowchart TB + Start([Start]) --> Init[Initialize Session] + Init --> BuildMsg[Build Initial Messages] + BuildMsg --> GetState[Get Terminal State] + + GetState --> LoopStart{Iteration < Max?} + + LoopStart -->|Yes| ManageCtx[Manage Context
Prune/Compact if needed] + ManageCtx --> ApplyCache[Apply Prompt Caching] + ApplyCache --> CallLLM[Call LLM with Tools] + + CallLLM --> HasCalls{Has Tool Calls?} + + HasCalls -->|Yes| ResetPending[Reset pending_completion] + ResetPending --> ExecTools[Execute Tool Calls] + ExecTools --> AddResults[Add Results to Messages] + AddResults --> LoopStart + + HasCalls -->|No| CheckPending{pending_completion?} + + CheckPending -->|No| SetPending[Set pending_completion = true] + SetPending --> InjectVerify[Inject Verification Prompt] + InjectVerify --> LoopStart + + CheckPending -->|Yes| Complete[Task Complete] + + LoopStart -->|No| Timeout[Max Iterations Reached] + + Complete --> Emit[Emit turn.completed] + Timeout --> Emit + Emit --> End([End]) + + style ManageCtx fill:#FF9800,color:#fff + style ApplyCache fill:#9C27B0,color:#fff + style CallLLM fill:#2196F3,color:#fff + style ExecTools fill:#4CAF50,color:#fff + style InjectVerify fill:#E91E63,color:#fff +``` + +--- + +## Data Flow + +### Request Flow + +```mermaid +sequenceDiagram + participant User + participant Entry as agent.py + participant Loop as loop.py + participant Context as compaction.py + participant Cache as Prompt Cache + participant LLM as LiteLLM Client + participant Provider as API Provider + participant Tools as Tool Registry + + User->>Entry: --instruction "Create hello.txt" + Entry->>Entry: Initialize AgentContext + Entry->>Entry: Initialize LiteLLMClient + Entry->>Loop: run_agent_loop() + + Loop->>Loop: Build messages [system, user, state] + + rect rgb(255, 240, 220) + Note over Loop,Provider: Iteration Loop + Loop->>Context: manage_context(messages) + Context-->>Loop: Managed messages + + Loop->>Cache: apply_caching(messages) + Cache-->>Loop: Cached messages + + Loop->>LLM: chat(messages, tools) + LLM->>Provider: API Request + Provider-->>LLM: Response + LLM-->>Loop: LLMResponse + + alt Has tool_calls + Loop->>Tools: execute(ctx, tool_name, args) + Tools-->>Loop: ToolResult + Loop->>Loop: Append to messages + end + end + + Loop-->>Entry: Complete + Entry-->>User: JSONL output +``` + +### Message Structure + +Messages accumulate through the session: + +```python +messages = [ + # 1. System prompt (stable, cached) + {"role": "system", "content": SYSTEM_PROMPT}, + + # 2. User instruction + {"role": "user", "content": "Create hello.txt with 'Hello World'"}, + + # 3. Initial state + {"role": "user", "content": "Current directory:\n```\n...\n```"}, + + # 4. Assistant response with tool calls + { + "role": "assistant", + "content": "Creating the file...", + "tool_calls": [ + {"id": "call_1", "type": "function", "function": {...}} + ] + }, + + # 5. Tool result + {"role": "tool", "tool_call_id": "call_1", "content": "File created"}, + + # ... continues until completion +] +``` + +--- + +## Module Descriptions + +### `src/core/loop.py` - Agent Loop + +The main orchestration module that: +- Initializes the session and emits JSONL events +- Manages the iterative Observe→Think→Act cycle +- Applies prompt caching for cost optimization +- Handles LLM errors with retry logic +- Triggers self-verification before completion + +### `src/core/compaction.py` - Context Manager + +Intelligent context management that: +- Estimates token usage (4 chars ≈ 1 token) +- Detects context overflow at 85% of usable window +- Prunes old tool outputs (protects last 40K tokens) +- Runs AI compaction when pruning is insufficient +- Preserves critical information through summarization + +### `src/llm/client.py` - LLM Client + +LiteLLM-based client that: +- Supports multiple providers (Chutes, OpenRouter, etc.) +- Tracks token usage and costs +- Handles tool/function calling format +- Enforces cost limits +- Provides usage statistics + +### `src/tools/registry.py` - Tool Registry + +Centralized tool management that: +- Registers all available tools +- Provides tool specs for LLM +- Executes tools with proper context +- Handles tool output truncation +- Manages image injection for `view_image` + +### `src/prompts/system.py` - System Prompt + +System prompt configuration that: +- Defines agent personality and behavior +- Specifies coding guidelines +- Includes AGENTS.md support +- Configures autonomous operation mode +- Provides environment context + +### `src/config/defaults.py` - Configuration + +Central configuration containing: +- Model settings (model name, tokens, temperature) +- Context management thresholds +- Tool output limits +- Prompt caching settings +- Execution limits + +--- + +## Context Management Pipeline + +```mermaid +flowchart LR + subgraph Input + Msgs[Messages
~150K tokens] + end + + subgraph Detection + Est[Estimate Tokens] + Check{> 85% of
168K usable?} + end + + subgraph Pruning + Scan[Scan backwards] + Protect[Protect last 40K
tool tokens] + Clear[Clear old outputs] + end + + subgraph Compaction + CheckAgain{Still > 85%?} + Summarize[AI Summarization] + NewMsgs[Compacted Messages] + end + + subgraph Output + Result[Managed Messages] + end + + Msgs --> Est --> Check + Check -->|No| Result + Check -->|Yes| Scan --> Protect --> Clear + Clear --> CheckAgain + CheckAgain -->|No| Result + CheckAgain -->|Yes| Summarize --> NewMsgs --> Result +``` + +--- + +## Tool Execution Flow + +```mermaid +flowchart TB + subgraph LLM["LLM Response"] + Calls["tool_calls: [
{name: 'shell_command', args: {command: 'ls'}},
{name: 'read_file', args: {file_path: 'README.md'}}
]"] + end + + subgraph Registry["Tool Registry"] + direction TB + Lookup[Lookup Tool] + Execute[Execute with Context] + Truncate[Truncate Output
max 2500 tokens] + end + + subgraph Tools["Tool Implementations"] + Shell[shell_command] + Read[read_file] + Write[write_file] + Patch[apply_patch] + Grep[grep_files] + List[list_dir] + end + + subgraph Output["Results"] + Results["tool results added
to messages"] + end + + Calls --> Lookup + Lookup --> Execute + Execute --> Shell & Read & Write & Patch & Grep & List + Shell & Read & Write & Patch & Grep & List --> Truncate + Truncate --> Results +``` + +--- + +## JSONL Event Emission + +BaseAgent emits structured JSONL events throughout execution: + +```mermaid +sequenceDiagram + participant Loop as Agent Loop + participant JSONL as Event Emitter + participant stdout as Standard Output + + Loop->>JSONL: emit(ThreadStartedEvent) + JSONL->>stdout: {"type": "thread.started", ...} + + Loop->>JSONL: emit(TurnStartedEvent) + JSONL->>stdout: {"type": "turn.started", ...} + + loop Each Tool Call + Loop->>JSONL: emit(ItemStartedEvent) + JSONL->>stdout: {"type": "item.started", ...} + Loop->>JSONL: emit(ItemCompletedEvent) + JSONL->>stdout: {"type": "item.completed", ...} + end + + Loop->>JSONL: emit(TurnCompletedEvent) + JSONL->>stdout: {"type": "turn.completed", "usage": {...}} +``` + +--- + +## Error Handling Strategy + +```mermaid +flowchart TB + Error[Error Occurs] --> Type{Error Type?} + + Type -->|CostLimitExceeded| Abort[Emit TurnFailed
Abort Session] + + Type -->|Authentication| Abort + + Type -->|Rate Limit| Retry{Attempt < 5?} + Retry -->|Yes| Wait[Wait 10s × attempt] + Wait --> TryAgain[Retry Request] + Retry -->|No| Abort + + Type -->|Timeout/504| Retry + + Type -->|Other| Retry + + TryAgain --> Success{Success?} + Success -->|Yes| Continue[Continue Loop] + Success -->|No| Retry +``` + +--- + +## Next Steps + +- [Configuration Reference](./configuration.md) - All settings explained +- [Tools Reference](./tools.md) - Detailed tool documentation +- [Context Management](./context-management.md) - Deep dive into memory management diff --git a/docs/best-practices.md b/docs/best-practices.md new file mode 100644 index 0000000..7fa098a --- /dev/null +++ b/docs/best-practices.md @@ -0,0 +1,408 @@ +# Best Practices + +> **Strategies for optimal performance, cost efficiency, and reliable results** + +## Core Principles + +BaseAgent follows these fundamental principles: + +1. **Explore First** - Always gather context before acting +2. **Iterate** - Never try to solve everything in one shot +3. **Verify** - Double-confirm before completing +4. **Fail Gracefully** - Handle errors and retry +5. **Stay Focused** - Complete exactly what's asked + +--- + +## Explore-First Pattern + +Before making any changes, always understand the context: + +```mermaid +flowchart LR + subgraph Bad["❌ Bad Pattern"] + B1[Receive Task] --> B2[Start Coding] + B2 --> B3[Hit Problems] + B3 --> B4[Backtrack] + end + + subgraph Good["✅ Good Pattern"] + G1[Receive Task] --> G2[Explore Codebase] + G2 --> G3[Understand Patterns] + G3 --> G4[Plan Approach] + G4 --> G5[Implement] + end +``` + +### Exploration Steps + +1. **Read README** - Understand project purpose +2. **List directory** - See project structure +3. **Find similar code** - Match existing patterns +4. **Check tests** - Understand expected behavior +5. **Review AGENTS.md** - Follow project instructions + +--- + +## Self-Verification + +BaseAgent automatically verifies work before completion: + +```mermaid +sequenceDiagram + participant Agent + participant Verify as Verification + participant LLM as LLM + + Agent->>Agent: No more tool calls + Agent->>Verify: Inject verification prompt + Verify->>LLM: Re-read instruction + LLM->>LLM: List requirements + LLM->>LLM: Verify each requirement + + alt All verified + LLM-->>Agent: Confirm completion + else Something missing + LLM-->>Agent: Continue working + end +``` + +### Verification Checklist + +The agent automatically asks: +- ✅ Did I read the ENTIRE original instruction? +- ✅ Did I list ALL requirements (explicit and implicit)? +- ✅ Did I run commands to VERIFY each requirement? +- ✅ Did I fix any issues found during verification? + +--- + +## Prompt Caching + +Achieve **90%+ cache hit rate** for massive cost savings: + +```mermaid +graph TB + subgraph Strategy["Caching Strategy"] + S1["Cache first 2 system messages"] + S2["Cache last 2 non-system messages"] + S3["Up to 4 breakpoints total"] + end + + subgraph Effect["Effect"] + E1["Request 1: Cache miss (create)"] + E2["Request 2: Cache HIT (90% saved)"] + E3["Request 3: Cache HIT (90% saved)"] + E4["Request N: Cache HIT (90% saved)"] + end + + S1 --> E1 + S2 --> E1 + E1 --> E2 --> E3 --> E4 + + style E2 fill:#4CAF50,color:#fff + style E3 fill:#4CAF50,color:#fff + style E4 fill:#4CAF50,color:#fff +``` + +### How It Works + +```python +# Messages structure +messages = [ + {"role": "system", "content": "...", "cache_control": {"type": "ephemeral"}}, # ✓ Cached + {"role": "user", "content": "original instruction"}, + {"role": "assistant", "content": "...", "tool_calls": [...]}, + {"role": "tool", "content": "..."}, + {"role": "assistant", "content": "...", "cache_control": {"type": "ephemeral"}}, # ✓ Cached + {"role": "user", "content": "verification", "cache_control": {"type": "ephemeral"}}, # ✓ Cached +] +``` + +### Cost Impact + +| Scenario | Cost per 1M tokens | +|----------|-------------------| +| No caching | $3.00 | +| 90% cache hit | $0.30 | +| **Savings** | **90%** | + +--- + +## Cost Optimization + +### Set Cost Limits + +```bash +export LLM_COST_LIMIT="5.0" # Max $5 per session +``` + +### Monitor Usage + +Watch the logs for token counts: +``` +[14:30:17] [loop] Tokens: 50000 input, 45000 cached, 500 output +``` + +### Optimize Instructions + +```bash +# ❌ Vague (causes exploration loops) +python3 agent.py --instruction "Fix the bugs" + +# ✅ Specific (direct action) +python3 agent.py --instruction "Fix the TypeError in src/api/handlers.py:42" +``` + +### Use Targeted Tools + +```bash +# ❌ Wasteful +ls -laR / # Lists entire filesystem + +# ✅ Efficient +list_dir(dir_path="src/", depth=2) +``` + +--- + +## Git Hygiene + +BaseAgent follows strict git rules: + +### ✅ Allowed + +- `git status` - Check current state +- `git log` - View history +- `git blame` - Understand code origins +- `git diff` - Review changes +- `git add` - Stage changes (when asked) +- `git commit` - Commit changes (when asked) + +### ❌ Forbidden + +- `git reset --hard` - Destructive +- `git checkout --` - Loses changes +- Reverting changes you didn't make +- Amending commits without permission +- Pushing without explicit request + +### Safe Practices + +```bash +# Always check state first +git status + +# Review before committing +git diff + +# Stage specific files +git add src/specific_file.py + +# Never force operations +# ❌ git push --force +``` + +--- + +## Writing Effective Instructions + +### Be Specific + +```bash +# ❌ Too vague +"Fix the code" + +# ✅ Specific +"Fix the NullPointerException in UserService.java:85 when user.email is null" +``` + +### Provide Context + +```bash +# ❌ Missing context +"Add authentication" + +# ✅ With context +"Add JWT authentication to the /api/users endpoint using the existing AuthService" +``` + +### Request Verification + +```bash +# ✅ Ask for verification +"Create a sorting algorithm and verify it works with [5, 2, 8, 1, 9]" +``` + +### Break Down Complex Tasks + +```bash +# ❌ Too complex for one instruction +"Build a complete e-commerce platform" + +# ✅ Incremental +"Create the product catalog data model with name, price, and description fields" +``` + +--- + +## Tool Usage Patterns + +### Shell Commands + +```python +# ✅ Use workdir +{"command": "ls -la", "workdir": "/workspace/src"} + +# ❌ Avoid cd chains +{"command": "cd /workspace && cd src && ls"} +``` + +### File Reading + +```python +# ✅ Read specific sections +{"file_path": "large.py", "offset": 100, "limit": 50} + +# ❌ Read entire large files +{"file_path": "large.py"} # May overwhelm context +``` + +### Searching + +```python +# ✅ Use grep_files for discovery +{"pattern": "def calculate", "include": "*.py", "path": "src/"} + +# Then read specific files found +{"file_path": "src/billing/calculator.py"} +``` + +### Editing + +```python +# ✅ Use apply_patch for surgical edits +{"patch": "*** Update File: src/utils.py\n@@ def old_func:\n- old\n+ new"} + +# ✅ Use write_file for new files +{"file_path": "new_module.py", "content": "..."} +``` + +--- + +## Handling Long Tasks + +For complex, multi-step tasks: + +### 1. Use update_plan + +```python +{ + "steps": [ + {"description": "Analyze existing code", "status": "completed"}, + {"description": "Design new module", "status": "in_progress"}, + {"description": "Implement core logic", "status": "pending"}, + {"description": "Add unit tests", "status": "pending"}, + {"description": "Update documentation", "status": "pending"} + ] +} +``` + +### 2. Monitor Context + +Watch for compaction events: +``` +[compaction] Context overflow detected, managing... +``` + +### 3. Save Progress + +If context compaction occurs, the summary preserves: +- Current progress +- Key decisions +- Remaining work +- Modified files + +--- + +## Error Handling + +BaseAgent handles errors gracefully: + +### Automatic Retry + +```mermaid +flowchart TB + Error[Error Occurs] --> Type{Error Type} + + Type -->|Rate Limit| Wait[Wait + Retry] + Type -->|Timeout| Wait + Type -->|Server Error| Wait + + Type -->|Auth Error| Fail[Abort] + Type -->|Cost Limit| Fail + + Wait --> Attempt{Attempt < 5?} + Attempt -->|Yes| Retry[Retry Request] + Attempt -->|No| Fail + + Retry --> Success{Success?} + Success -->|Yes| Continue[Continue] + Success -->|No| Attempt +``` + +### Recovery Strategies + +1. **Try alternatives** - If one approach fails, try another +2. **Check documentation** - Read AGENTS.md, README.md +3. **Simplify** - Break complex operations into steps +4. **Report issues** - Note blockers in final message + +--- + +## Performance Tips + +### Reduce Iterations + +1. Give specific, complete instructions +2. Provide necessary context upfront +3. Avoid vague requirements + +### Minimize Token Usage + +1. Search before reading entire files +2. Use targeted directory listings +3. Keep tool outputs focused + +### Maximize Cache Hits + +1. Keep system prompt stable +2. Don't modify early messages +3. Let the agent handle caching automatically + +--- + +## Checklist + +Before running the agent: + +- [ ] Clear, specific instruction +- [ ] Necessary context provided +- [ ] API key configured +- [ ] Cost limit set appropriately +- [ ] Working directory correct + +After completion: + +- [ ] Verify output matches requirements +- [ ] Check for any error messages +- [ ] Review modified files +- [ ] Run relevant tests + +--- + +## Next Steps + +- [Configuration](./configuration.md) - Tune settings +- [Context Management](./context-management.md) - Memory optimization +- [Tools Reference](./tools.md) - Detailed tool docs diff --git a/docs/chutes-integration.md b/docs/chutes-integration.md new file mode 100644 index 0000000..75b4955 --- /dev/null +++ b/docs/chutes-integration.md @@ -0,0 +1,378 @@ +# Chutes API Integration + +> **Using Chutes AI as your LLM provider for BaseAgent** + +## Overview + +[Chutes AI](https://chutes.ai) provides access to advanced language models through a simple API. BaseAgent supports Chutes as a first-class provider, offering access to the **Kimi K2.5-TEE** model with its powerful thinking capabilities. + +--- + +## Chutes API Features + +| Feature | Value | +|---------|-------| +| **API Base URL** | `https://llm.chutes.ai/v1` | +| **Default Model** | `moonshotai/Kimi-K2.5-TEE` | +| **Model Parameters** | 1T total, 32B activated | +| **Context Window** | 256K tokens | +| **Thinking Mode** | Enabled by default | + +--- + +## Quick Setup + +### Step 1: Get Your API Token + +1. Visit [chutes.ai](https://chutes.ai) +2. Create an account or sign in +3. Navigate to API settings +4. Generate an API token + +### Step 2: Configure Environment + +```bash +# Required: API token +export CHUTES_API_TOKEN="your-token-from-chutes.ai" + +# Optional: Explicitly set provider and model +export LLM_PROVIDER="chutes" +export LLM_MODEL="moonshotai/Kimi-K2.5-TEE" +``` + +### Step 3: Run BaseAgent + +```bash +python3 agent.py --instruction "Your task description" +``` + +--- + +## Authentication Flow + +```mermaid +sequenceDiagram + participant Agent as BaseAgent + participant Client as LiteLLM Client + participant Chutes as Chutes API + + Agent->>Client: Initialize with CHUTES_API_TOKEN + Client->>Client: Configure litellm + + loop Each Request + Agent->>Client: chat(messages, tools) + Client->>Chutes: POST /v1/chat/completions + Note over Client,Chutes: Authorization: Bearer $CHUTES_API_TOKEN + Chutes-->>Client: Response with tokens + Client-->>Agent: LLMResponse + end +``` + +--- + +## Model Details: Kimi K2.5-TEE + +The **moonshotai/Kimi-K2.5-TEE** model offers: + +### Architecture +- **Total Parameters**: 1 Trillion (1T) +- **Activated Parameters**: 32 Billion (32B) +- **Architecture**: Mixture of Experts (MoE) +- **Context Length**: 256,000 tokens + +### Thinking Mode + +Kimi K2.5-TEE supports a "thinking mode" where the model shows its reasoning process: + +```mermaid +sequenceDiagram + participant User + participant Model as Kimi K2.5-TEE + participant Response + + User->>Model: Complex task instruction + + rect rgb(230, 240, 255) + Note over Model: Thinking Mode Active + Model->>Model: Analyze problem + Model->>Model: Consider approaches + Model->>Model: Evaluate options + end + + Model->>Response: Reasoning process... + Model->>Response: Final answer/action +``` + +### Temperature Settings + +| Mode | Temperature | Top-p | Description | +|------|-------------|-------|-------------| +| **Thinking** | 1.0 | 0.95 | More exploratory reasoning | +| **Instant** | 0.6 | 0.95 | Faster, more deterministic | + +--- + +## Configuration Options + +### Basic Configuration + +```python +# src/config/defaults.py +CONFIG = { + "model": os.environ.get("LLM_MODEL", "moonshotai/Kimi-K2.5-TEE"), + "provider": "chutes", + "temperature": 1.0, # For thinking mode + "max_tokens": 16384, +} +``` + +### Environment Variables + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `CHUTES_API_TOKEN` | Yes | - | API token from chutes.ai | +| `LLM_PROVIDER` | No | `openrouter` | Set to `chutes` | +| `LLM_MODEL` | No | `moonshotai/Kimi-K2.5-TEE` | Model identifier | +| `LLM_COST_LIMIT` | No | `10.0` | Max cost in USD | + +--- + +## Thinking Mode Processing + +When thinking mode is enabled, responses include `` tags: + +```xml + +The user wants to create a file with specific content. +I should: +1. Check if the file already exists +2. Create the file with the requested content +3. Verify the file was created correctly + + +I'll create the file for you now. +``` + +BaseAgent can be configured to: +- **Parse and strip** the thinking tags (show only final answer) +- **Keep** the thinking content (useful for debugging) +- **Log** thinking to stderr while showing final answer + +### Parsing Example + +```python +import re + +def parse_thinking(response_text: str) -> tuple[str, str]: + """Extract thinking and final response.""" + think_pattern = r'(.*?)' + match = re.search(think_pattern, response_text, re.DOTALL) + + if match: + thinking = match.group(1).strip() + final = re.sub(think_pattern, '', response_text, flags=re.DOTALL).strip() + return thinking, final + + return "", response_text +``` + +--- + +## API Request Format + +Chutes API follows OpenAI-compatible format: + +```bash +curl -X POST https://llm.chutes.ai/v1/chat/completions \ + -H "Authorization: Bearer $CHUTES_API_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moonshotai/Kimi-K2.5-TEE", + "messages": [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Hello!"} + ], + "max_tokens": 1024, + "temperature": 1.0, + "top_p": 0.95 + }' +``` + +--- + +## Fallback to OpenRouter + +If Chutes is unavailable, BaseAgent can fall back to OpenRouter: + +```mermaid +flowchart TB + Start[API Request] --> Check{Chutes Available?} + + Check -->|Yes| Chutes[Send to Chutes API] + Chutes --> Success{Success?} + Success -->|Yes| Done[Return Response] + Success -->|No| Retry{Retry Count < 3?} + + Retry -->|Yes| Chutes + Retry -->|No| Fallback[Use OpenRouter] + + Check -->|No| Fallback + Fallback --> Done +``` + +### Configuration for Fallback + +```bash +# Primary: Chutes +export CHUTES_API_TOKEN="..." +export LLM_PROVIDER="chutes" + +# Fallback: OpenRouter +export OPENROUTER_API_KEY="..." +``` + +### Switching Providers + +```bash +# Switch to OpenRouter +export LLM_PROVIDER="openrouter" +export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514" + +# Switch back to Chutes +export LLM_PROVIDER="chutes" +export LLM_MODEL="moonshotai/Kimi-K2.5-TEE" +``` + +--- + +## Cost Considerations + +### Pricing (Approximate) + +| Metric | Cost | +|--------|------| +| Input tokens | Varies by model | +| Output tokens | Varies by model | +| Cached input | Reduced rate | + +### Cost Management + +```bash +# Set cost limit +export LLM_COST_LIMIT="5.0" # Max $5.00 per session +``` + +BaseAgent tracks costs and will abort if the limit is exceeded: + +```python +# In src/llm/client.py +if self._total_cost >= self.cost_limit: + raise CostLimitExceeded( + f"Cost limit exceeded: ${self._total_cost:.4f}", + used=self._total_cost, + limit=self.cost_limit, + ) +``` + +--- + +## Troubleshooting + +### Authentication Errors + +``` +LLMError: authentication_error +``` + +**Solution**: Verify your token is correct and exported: + +```bash +echo $CHUTES_API_TOKEN # Should show your token +export CHUTES_API_TOKEN="correct-token" +``` + +### Rate Limiting + +``` +LLMError: rate_limit +``` + +**Solution**: BaseAgent automatically retries with exponential backoff. You can also: +- Wait a few minutes before retrying +- Reduce request frequency +- Check your API plan limits + +### Model Not Found + +``` +LLMError: Model 'xyz' not found +``` + +**Solution**: Use the correct model identifier: + +```bash +export LLM_MODEL="moonshotai/Kimi-K2.5-TEE" +``` + +### Connection Timeouts + +``` +LLMError: timeout +``` + +**Solution**: BaseAgent retries automatically. If persistent: +- Check your internet connection +- Verify Chutes API status +- Consider using OpenRouter as fallback + +--- + +## Integration with LiteLLM + +BaseAgent uses [LiteLLM](https://docs.litellm.ai/) for provider abstraction: + +```python +# src/llm/client.py +import litellm + +# For Chutes, configure base URL +litellm.api_base = "https://llm.chutes.ai/v1" + +# Make request +response = litellm.completion( + model="moonshotai/Kimi-K2.5-TEE", + messages=messages, + api_key=os.environ.get("CHUTES_API_TOKEN"), +) +``` + +--- + +## Best Practices + +### For Optimal Performance + +1. **Enable thinking mode** for complex reasoning tasks +2. **Use appropriate temperature** (1.0 for exploration, 0.6 for precision) +3. **Leverage the 256K context** for large codebases +4. **Monitor costs** with `LLM_COST_LIMIT` + +### For Reliability + +1. **Set up fallback** to OpenRouter +2. **Handle rate limits** gracefully (automatic in BaseAgent) +3. **Log responses** for debugging complex tasks + +### For Cost Efficiency + +1. **Enable prompt caching** (reduces costs by 90%) +2. **Use context management** to avoid token waste +3. **Set reasonable cost limits** for testing + +--- + +## Next Steps + +- [Configuration Reference](./configuration.md) - All settings explained +- [Best Practices](./best-practices.md) - Optimization tips +- [Usage Guide](./usage.md) - Command-line options diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 0000000..492f074 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,304 @@ +# Configuration Reference + +> **Complete guide to all configuration options in BaseAgent** + +## Overview + +BaseAgent configuration is centralized in `src/config/defaults.py`. Settings can be customized via environment variables or by modifying the configuration file directly. + +--- + +## Configuration File + +The main configuration is stored in the `CONFIG` dictionary: + +```python +# src/config/defaults.py +CONFIG = { + # Model Settings + "model": "openrouter/anthropic/claude-sonnet-4-20250514", + "provider": "openrouter", + "temperature": 0.0, + "max_tokens": 16384, + "reasoning_effort": "none", + + # Agent Execution + "max_iterations": 200, + "max_output_tokens": 2500, + "shell_timeout": 60, + + # Context Management + "model_context_limit": 200_000, + "output_token_max": 32_000, + "auto_compact_threshold": 0.85, + "prune_protect": 40_000, + "prune_minimum": 20_000, + + # Prompt Caching + "cache_enabled": True, + + # Execution Flags + "bypass_approvals": True, + "bypass_sandbox": True, + "skip_git_check": True, + "unified_exec": True, + "json_output": True, + + # Completion + "require_completion_confirmation": False, +} +``` + +--- + +## Environment Variables + +### LLM Provider Settings + +| Variable | Default | Description | +|----------|---------|-------------| +| `LLM_MODEL` | `openrouter/anthropic/claude-sonnet-4-20250514` | Model identifier | +| `LLM_PROVIDER` | `openrouter` | Provider name (`chutes`, `openrouter`, etc.) | +| `LLM_COST_LIMIT` | `10.0` | Maximum cost in USD before aborting | + +### API Keys + +| Variable | Provider | Description | +|----------|----------|-------------| +| `CHUTES_API_TOKEN` | Chutes AI | Token from chutes.ai | +| `OPENROUTER_API_KEY` | OpenRouter | API key from openrouter.ai | +| `ANTHROPIC_API_KEY` | Anthropic | Direct Anthropic API key | +| `OPENAI_API_KEY` | OpenAI | OpenAI API key | + +### Example Setup + +```bash +# For Chutes AI +export CHUTES_API_TOKEN="your-token" +export LLM_PROVIDER="chutes" +export LLM_MODEL="moonshotai/Kimi-K2.5-TEE" + +# For OpenRouter +export OPENROUTER_API_KEY="sk-or-v1-..." +export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514" +``` + +--- + +## Configuration Sections + +### Model Settings + +```mermaid +graph LR + subgraph Model["Model Configuration"] + M1["model
Model identifier"] + M2["provider
API provider"] + M3["temperature
Response randomness"] + M4["max_tokens
Max output tokens"] + M5["reasoning_effort
Reasoning depth"] + end +``` + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| `model` | `str` | `openrouter/anthropic/claude-sonnet-4-20250514` | Full model identifier with provider prefix | +| `provider` | `str` | `openrouter` | LLM provider name | +| `temperature` | `float` | `0.0` | Response randomness (0 = deterministic) | +| `max_tokens` | `int` | `16384` | Maximum tokens in LLM response | +| `reasoning_effort` | `str` | `none` | Reasoning depth: `none`, `minimal`, `low`, `medium`, `high`, `xhigh` | + +### Agent Execution Settings + +```mermaid +graph LR + subgraph Execution["Execution Limits"] + E1["max_iterations
200 iterations"] + E2["max_output_tokens
2500 tokens"] + E3["shell_timeout
60 seconds"] + end +``` + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| `max_iterations` | `int` | `200` | Maximum loop iterations before stopping | +| `max_output_tokens` | `int` | `2500` | Max tokens for tool output truncation | +| `shell_timeout` | `int` | `60` | Shell command timeout in seconds | + +### Context Management + +```mermaid +graph TB + subgraph Context["Context Window Management"] + C1["model_context_limit: 200K"] + C2["output_token_max: 32K"] + C3["Usable: 168K"] + C4["auto_compact_threshold: 85%"] + C5["Trigger: ~143K"] + end + + C1 --> C3 + C2 --> C3 + C3 --> C4 + C4 --> C5 +``` + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| `model_context_limit` | `int` | `200000` | Total model context window (tokens) | +| `output_token_max` | `int` | `32000` | Tokens reserved for output | +| `auto_compact_threshold` | `float` | `0.85` | Trigger compaction at this % of usable context | +| `prune_protect` | `int` | `40000` | Protect this many tokens of recent tool output | +| `prune_minimum` | `int` | `20000` | Only prune if recovering at least this many tokens | + +### Prompt Caching + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| `cache_enabled` | `bool` | `True` | Enable Anthropic prompt caching | + +> **Note**: Prompt caching requires minimum token thresholds per breakpoint: +> - Claude Opus 4.5 on Bedrock: 4096 tokens +> - Claude Sonnet/other: 1024 tokens + +### Execution Flags + +| Setting | Type | Default | Description | +|---------|------|---------|-------------| +| `bypass_approvals` | `bool` | `True` | Skip user approval prompts | +| `bypass_sandbox` | `bool` | `True` | Bypass sandbox restrictions | +| `skip_git_check` | `bool` | `True` | Skip git repository validation | +| `unified_exec` | `bool` | `True` | Enable unified execution mode | +| `json_output` | `bool` | `True` | Always emit JSONL output | +| `require_completion_confirmation` | `bool` | `False` | Require double-confirm before completing | + +--- + +## Provider-Specific Configuration + +### Chutes AI + +```python +# Environment +CHUTES_API_TOKEN="your-token" +LLM_PROVIDER="chutes" +LLM_MODEL="moonshotai/Kimi-K2.5-TEE" + +# Model features +# - 1T parameters, 32B activated +# - 256K context window +# - Thinking mode enabled by default +# - Temperature: 1.0 (thinking), 0.6 (instant) +``` + +### OpenRouter + +```python +# Environment +OPENROUTER_API_KEY="sk-or-v1-..." +LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514" + +# Requires openrouter/ prefix for litellm +``` + +### Direct Anthropic + +```python +# Environment +ANTHROPIC_API_KEY="sk-ant-..." +LLM_MODEL="claude-3-5-sonnet-20241022" + +# No prefix needed for direct API +``` + +--- + +## Configuration Workflow + +```mermaid +flowchart TB + subgraph Load["Configuration Loading"] + Env[Environment Variables] + File[defaults.py] + Merge[Merged Config] + end + + subgraph Apply["Configuration Application"] + Loop[Agent Loop] + LLM[LLM Client] + Context[Context Manager] + Tools[Tool Registry] + end + + Env --> Merge + File --> Merge + Merge --> Loop + Merge --> LLM + Merge --> Context + Merge --> Tools +``` + +--- + +## Computed Values + +Some values are computed from configuration: + +```python +# Usable context window +usable_context = model_context_limit - output_token_max +# Default: 200,000 - 32,000 = 168,000 tokens + +# Compaction trigger threshold +compaction_trigger = usable_context * auto_compact_threshold +# Default: 168,000 * 0.85 = 142,800 tokens + +# Token estimation +chars_per_token = 4 # Heuristic +tokens = len(text) // 4 +``` + +--- + +## Best Practices + +### For Cost Optimization + +```bash +# Lower cost limit for testing +export LLM_COST_LIMIT="1.0" + +# Use smaller context for simple tasks +# (edit defaults.py) +"model_context_limit": 100_000 +``` + +### For Long Tasks + +```bash +# Increase iterations +# (edit defaults.py) +"max_iterations": 500 + +# Lower compaction threshold for aggressive memory management +"auto_compact_threshold": 0.70 +``` + +### For Debugging + +```bash +# Disable caching to see full API calls +# (edit defaults.py) +"cache_enabled": False + +# Increase output limits for more context +"max_output_tokens": 5000 +``` + +--- + +## Next Steps + +- [Chutes Integration](./chutes-integration.md) - Configure Chutes API +- [Context Management](./context-management.md) - Understand memory management +- [Best Practices](./best-practices.md) - Optimization tips diff --git a/docs/context-management.md b/docs/context-management.md new file mode 100644 index 0000000..2f26e75 --- /dev/null +++ b/docs/context-management.md @@ -0,0 +1,412 @@ +# Context Management + +> **How BaseAgent manages memory and prevents token overflow** + +## Why Context Management Matters + +Large Language Models have finite context windows. Without proper management: +- "Context too long" errors terminate sessions +- Critical information gets lost +- Response quality degrades +- Costs increase unnecessarily + +BaseAgent implements sophisticated context management inspired by OpenCode and Codex. + +--- + +## Context Window Overview + +```mermaid +graph TB + subgraph Window["Claude Opus 4.5 Context Window (200K tokens)"] + Output["Reserved for Output
32K tokens"] + Usable["Usable Context
168K tokens"] + end + + subgraph Thresholds["Management Thresholds"] + Safe["Safe Zone
< 85% (143K)"] + Warning["Warning Zone
85-100%"] + Overflow["Overflow
> 168K"] + end + + Usable --> Safe + Usable --> Warning + Usable --> Overflow + + style Safe fill:#4CAF50,color:#fff + style Warning fill:#FF9800,color:#fff + style Overflow fill:#F44336,color:#fff +``` + +### Key Numbers + +| Metric | Value | Description | +|--------|-------|-------------| +| Total context | 200,000 | Model's full context window | +| Output reserve | 32,000 | Reserved for LLM response | +| Usable context | 168,000 | Available for messages | +| Compaction threshold | 85% | Trigger at 142,800 tokens | +| Prune protect | 40,000 | Recent tool output to keep | +| Prune minimum | 20,000 | Minimum savings to prune | + +--- + +## Token Estimation + +BaseAgent estimates tokens using a simple heuristic: + +```python +# 1 token ≈ 4 characters +def estimate_tokens(text: str) -> int: + return len(text) // 4 +``` + +### Message Token Components + +```mermaid +graph LR + subgraph Message["Message Token Estimation"] + Content["Content
(text / 4)"] + Images["Images
(~1000 each)"] + ToolCalls["Tool Calls
(name + args)"] + Overhead["Role Overhead
(~4 tokens)"] + end + + Content --> Total["Total Tokens"] + Images --> Total + ToolCalls --> Total + Overhead --> Total +``` + +--- + +## Context Management Pipeline + +```mermaid +flowchart TB + subgraph Input["Every Iteration"] + Messages["Current Messages"] + end + + subgraph Detection["1. Detection"] + Estimate["Estimate Total Tokens"] + Check{"Above 85%
Threshold?"} + end + + subgraph Pruning["2. Pruning (First Pass)"] + Scan["Scan Backwards"] + Protect["Protect Last 40K
Tool Output Tokens"] + Clear["Clear Old Tool Outputs"] + CheckAgain{"Still Above
Threshold?"} + end + + subgraph Compaction["3. AI Compaction (Second Pass)"] + Summary["Generate Summary
via LLM"] + Rebuild["Rebuild Messages:
System + Summary"] + end + + subgraph Output["Continue Loop"] + Managed["Managed Messages"] + end + + Messages --> Estimate --> Check + Check -->|No| Managed + Check -->|Yes| Scan --> Protect --> Clear --> CheckAgain + CheckAgain -->|No| Managed + CheckAgain -->|Yes| Summary --> Rebuild --> Managed + + style Pruning fill:#FF9800,color:#fff + style Compaction fill:#9C27B0,color:#fff +``` + +--- + +## Stage 1: Tool Output Pruning + +The first defense against context overflow is pruning old tool outputs. + +### Strategy + +1. Scan messages **backwards** (most recent first) +2. Skip the first 2 user turns (most recent) +3. Accumulate tool output tokens +4. After 40K tokens accumulated, mark older outputs for pruning +5. Only prune if savings exceed 20K tokens + +### Implementation + +```python +def prune_old_tool_outputs(messages, protect_last_turns=2): + total = 0 # Total tool output tokens seen + pruned = 0 # Tokens to be pruned + to_prune = [] + turns = 0 + + for i in range(len(messages) - 1, -1, -1): + msg = messages[i] + + if msg["role"] == "user": + turns += 1 + + if turns < protect_last_turns: + continue + + if msg["role"] == "tool": + content = msg.get("content", "") + estimate = len(content) // 4 + total += estimate + + if total > PRUNE_PROTECT: # 40K + pruned += estimate + to_prune.append(i) + + if pruned > PRUNE_MINIMUM: # 20K + # Replace content with marker + for idx in to_prune: + messages[idx]["content"] = "[Old tool result content cleared]" + + return messages +``` + +### Visual Example + +```mermaid +graph TB + subgraph Before["Before Pruning (150K tokens)"] + S1["System Prompt
5K tokens"] + U1["User Instruction
1K tokens"] + A1["Assistant + Tools
10K tokens"] + T1["Tool Results (old)
50K tokens"] + A2["Assistant + Tools
10K tokens"] + T2["Tool Results (old)
40K tokens"] + A3["Assistant + Tools
10K tokens"] + T3["Tool Results (recent)
24K tokens"] + end + + subgraph After["After Pruning (60K tokens)"] + S2["System Prompt
5K tokens"] + U2["User Instruction
1K tokens"] + A4["Assistant + Tools
10K tokens"] + T4["[cleared]
~0 tokens"] + A5["Assistant + Tools
10K tokens"] + T5["[cleared]
~0 tokens"] + A6["Assistant + Tools
10K tokens"] + T6["Tool Results (protected)
24K tokens"] + end + + T1 -.-> T4 + T2 -.-> T5 + T3 --> T6 + + style T4 fill:#FF9800,color:#fff + style T5 fill:#FF9800,color:#fff + style T6 fill:#4CAF50,color:#fff +``` + +--- + +## Stage 2: AI Compaction + +When pruning isn't enough, BaseAgent uses the LLM to summarize the conversation. + +### Compaction Process + +```mermaid +sequenceDiagram + participant Loop as Agent Loop + participant Compact as Compaction + participant LLM as LLM API + + Loop->>Compact: Context still too large + Compact->>Compact: Add compaction prompt + Compact->>LLM: Request summary + LLM-->>Compact: Summary response + Compact->>Compact: Build new messages + Compact-->>Loop: [System, Summary] +``` + +### Compaction Prompt + +```python +COMPACTION_PROMPT = """ +You are performing a CONTEXT CHECKPOINT COMPACTION. +Create a handoff summary for another LLM that will resume the task. + +Include: +- Current progress and key decisions made +- Important context, constraints, or user preferences +- What remains to be done (clear next steps) +- Any critical data, examples, or references needed to continue +- Which files were modified and how +- Any errors encountered and how they were resolved + +Be concise, structured, and focused on helping the next LLM +seamlessly continue the work. Use bullet points and clear sections. +""" +``` + +### Result + +The compacted messages are: + +```python +compacted = [ + {"role": "system", "content": original_system_prompt}, + {"role": "user", "content": SUMMARY_PREFIX + llm_summary}, +] +``` + +### Summary Prefix + +```python +SUMMARY_PREFIX = """ +Another language model started to solve this problem and produced +a summary of its thinking process. You also have access to the state +of the tools that were used. Use this to build on the work that has +already been done and avoid duplicating work. + +Here is the summary from the previous context: + +""" +``` + +--- + +## Middle-Out Truncation + +For individual tool outputs, BaseAgent uses middle-out truncation: + +```mermaid +graph LR + subgraph Original["Original Output"] + O1["Start
(headers, definitions)"] + O2["Middle
(repetitive data)"] + O3["End
(results, errors)"] + end + + subgraph Truncated["Truncated Output"] + T1["Start
(preserved)"] + T2["[...truncated...]"] + T3["End
(preserved)"] + end + + O1 --> T1 + O2 -.-> T2 + O3 --> T3 + + style O2 fill:#FF9800,color:#fff + style T2 fill:#FF9800,color:#fff +``` + +### Implementation + +```python +def middle_out_truncate(text: str, max_tokens: int = 2500) -> str: + max_chars = max_tokens * 4 # 4 chars per token + + if len(text) <= max_chars: + return text + + keep = max_chars // 2 - 50 # Room for marker + return f"{text[:keep]}\n\n[...truncated...]\n\n{text[-keep:]}" +``` + +### Why Middle-Out? + +| Section | Contains | Value | +|---------|----------|-------| +| **Start** | Headers, imports, definitions | High | +| **Middle** | Repetitive data, logs | Low | +| **End** | Results, errors, summaries | High | + +--- + +## Configuration Options + +| Setting | Default | Description | +|---------|---------|-------------| +| `model_context_limit` | 200,000 | Total context window | +| `output_token_max` | 32,000 | Reserved for output | +| `auto_compact_threshold` | 0.85 | Trigger threshold | +| `prune_protect` | 40,000 | Recent tool tokens to keep | +| `prune_minimum` | 20,000 | Minimum savings to prune | +| `max_output_tokens` | 2,500 | Per-tool output limit | + +### Tuning Guidelines + +**For Long Tasks:** +```python +"auto_compact_threshold": 0.70, # More aggressive +"prune_protect": 30_000, # Protect less +``` + +**For Complex Tasks (need more context):** +```python +"auto_compact_threshold": 0.90, # Less aggressive +"prune_protect": 60_000, # Protect more +``` + +--- + +## Monitoring Context Usage + +BaseAgent logs context status each iteration: + +``` +[14:30:16] [compaction] Context: 45000 tokens (26.8% of 168000) +[14:35:22] [compaction] Context: 125000 tokens (74.4% of 168000) +[14:38:45] [compaction] Context: 148000 tokens (88.1% of 168000) +[14:38:45] [compaction] Context overflow detected, managing... +[14:38:45] [compaction] Prune scan: 95000 total tokens, 55000 prunable +[14:38:45] [compaction] Pruning 12 tool outputs, recovering ~55000 tokens +[14:38:46] [compaction] Pruning sufficient: 148000 -> 93000 tokens +``` + +--- + +## Best Practices + +### 1. Keep Tool Outputs Focused + +```bash +# ❌ Too much output +ls -laR / # Lists entire filesystem + +# ✅ Targeted +ls -la /workspace/src/ # Just what's needed +``` + +### 2. Use Appropriate Search Patterns + +```bash +# ❌ Too broad +grep "function" # Matches everything + +# ✅ Specific +grep "def calculate_total" src/billing.py +``` + +### 3. Read Sections, Not Entire Files + +```json +// ❌ Entire large file +{"name": "read_file", "arguments": {"file_path": "huge.py"}} + +// ✅ Specific section +{"name": "read_file", "arguments": {"file_path": "huge.py", "offset": 100, "limit": 50}} +``` + +### 4. Monitor Long Sessions + +For tasks exceeding 50 iterations, watch for: +- Repeated compaction events +- Context oscillating near threshold +- Loss of important context after compaction + +--- + +## Next Steps + +- [Best Practices](./best-practices.md) - Optimization strategies +- [Configuration](./configuration.md) - Tuning options +- [Architecture](./architecture.md) - System design diff --git a/docs/installation.md b/docs/installation.md new file mode 100644 index 0000000..24d6700 --- /dev/null +++ b/docs/installation.md @@ -0,0 +1,249 @@ +# Installation Guide + +> **Step-by-step instructions for setting up BaseAgent** + +## Prerequisites + +Before installing BaseAgent, ensure you have: + +| Requirement | Version | Notes | +|-------------|---------|-------| +| Python | 3.9+ | Python 3.11+ recommended | +| pip | Latest | Python package manager | +| Git | 2.x | For cloning the repository | + +### Optional but Recommended + +| Tool | Purpose | +|------|---------| +| `ripgrep` (`rg`) | Fast file searching (used by `grep_files` tool) | +| `tree` | Directory visualization | + +--- + +## Installation Methods + +### Method 1: Using pyproject.toml (Recommended) + +```bash +# Clone the repository +git clone https://github.com/your-org/baseagent.git +cd baseagent + +# Install with pip +pip install . +``` + +This installs BaseAgent as a package with all dependencies. + +### Method 2: Using requirements.txt + +```bash +# Clone the repository +git clone https://github.com/your-org/baseagent.git +cd baseagent + +# Install dependencies +pip install -r requirements.txt +``` + +### Method 3: Development Installation + +For development with editable installs: + +```bash +git clone https://github.com/your-org/baseagent.git +cd baseagent + +# Editable install +pip install -e . +``` + +--- + +## Dependencies + +BaseAgent requires these Python packages: + +``` +litellm>=1.0.0 # LLM API abstraction +httpx>=0.24.0 # HTTP client +pydantic>=2.0.0 # Data validation +``` + +These are automatically installed via pip. + +--- + +## Environment Setup + +### 1. Choose Your LLM Provider + +BaseAgent supports multiple LLM providers. Choose one: + +#### Option A: Chutes AI (Recommended) + +```bash +# Set your Chutes API token +export CHUTES_API_TOKEN="your-token-from-chutes.ai" + +# Configure provider +export LLM_PROVIDER="chutes" +export LLM_MODEL="moonshotai/Kimi-K2.5-TEE" +``` + +Get your token at [chutes.ai](https://chutes.ai) + +#### Option B: OpenRouter + +```bash +# Set your OpenRouter API key +export OPENROUTER_API_KEY="sk-or-v1-..." + +# Model is auto-configured for OpenRouter +``` + +Get your key at [openrouter.ai](https://openrouter.ai) + +#### Option C: Direct Provider APIs + +```bash +# For Anthropic +export ANTHROPIC_API_KEY="sk-ant-..." + +# For OpenAI +export OPENAI_API_KEY="sk-..." +``` + +### 2. Create a Configuration File (Optional) + +Create `.env` in the project root: + +```bash +# .env file +CHUTES_API_TOKEN=your-token-here +LLM_PROVIDER=chutes +LLM_MODEL=moonshotai/Kimi-K2.5-TEE +LLM_COST_LIMIT=10.0 +``` + +--- + +## Verification + +### Step 1: Verify Python Installation + +```bash +python3 --version +# Expected: Python 3.11.x or higher +``` + +### Step 2: Verify Dependencies + +```bash +python3 -c "import litellm; print('litellm:', litellm.__version__)" +python3 -c "import httpx; print('httpx:', httpx.__version__)" +python3 -c "import pydantic; print('pydantic:', pydantic.__version__)" +``` + +### Step 3: Verify BaseAgent Installation + +```bash +python3 -c "from src.core.loop import run_agent_loop; print('BaseAgent: OK')" +``` + +### Step 4: Test Run + +```bash +python3 agent.py --instruction "Print 'Hello, BaseAgent!'" +``` + +Expected output: JSONL events showing the agent executing your instruction. + +--- + +## Directory Structure After Installation + +``` +baseagent/ +├── agent.py # ✓ Entry point +├── src/ +│ ├── core/ +│ │ ├── loop.py # ✓ Agent loop +│ │ └── compaction.py # ✓ Context manager +│ ├── llm/ +│ │ └── client.py # ✓ LLM client +│ ├── config/ +│ │ └── defaults.py # ✓ Configuration +│ ├── tools/ # ✓ Tool implementations +│ ├── prompts/ +│ │ └── system.py # ✓ System prompt +│ └── output/ +│ └── jsonl.py # ✓ Event emission +├── requirements.txt # ✓ Dependencies +├── pyproject.toml # ✓ Package config +├── docs/ # ✓ Documentation +├── rules/ # Development guidelines +└── astuces/ # Implementation techniques +``` + +--- + +## Troubleshooting + +### Issue: `ModuleNotFoundError: No module named 'litellm'` + +**Solution**: Install dependencies + +```bash +pip install -r requirements.txt +# or +pip install litellm httpx pydantic +``` + +### Issue: `ImportError: cannot import name 'run_agent_loop'` + +**Solution**: Ensure you're in the project root directory + +```bash +cd /path/to/baseagent +python3 agent.py --instruction "..." +``` + +### Issue: API Key Errors + +**Solution**: Verify your environment variables are set + +```bash +# Check if variables are set +echo $CHUTES_API_TOKEN +echo $OPENROUTER_API_KEY + +# Re-export if needed +export CHUTES_API_TOKEN="your-token" +``` + +### Issue: `rg` (ripgrep) Not Found + +The `grep_files` tool will fall back to `grep` if `rg` is not available, but ripgrep is much faster. + +**Solution**: Install ripgrep + +```bash +# Ubuntu/Debian +apt-get install ripgrep + +# macOS +brew install ripgrep + +# Or via cargo +cargo install ripgrep +``` + +--- + +## Next Steps + +- [Quick Start](./quickstart.md) - Run your first task +- [Configuration](./configuration.md) - Customize settings +- [Chutes Integration](./chutes-integration.md) - Set up Chutes API diff --git a/docs/overview.md b/docs/overview.md new file mode 100644 index 0000000..c05a533 --- /dev/null +++ b/docs/overview.md @@ -0,0 +1,214 @@ +# BaseAgent Overview + +> **A high-performance autonomous coding agent built for generalist problem-solving** + +## What is BaseAgent? + +BaseAgent is an autonomous coding agent designed for the [Term Challenge](https://term.challenge). Unlike traditional scripted automation, BaseAgent uses Large Language Models (LLMs) to reason about tasks and make decisions dynamically. + +The agent receives natural language instructions and autonomously: +- Explores the codebase +- Plans and executes solutions +- Validates its own work +- Handles errors and edge cases + +--- + +## Core Design Principles + +### 1. No Hardcoding + +BaseAgent follows the **Golden Rule**: all decisions are made by the LLM, not by conditional logic. + +```python +# ❌ FORBIDDEN - Hardcoded task routing +if "file" in instruction: + create_file() +elif "compile" in instruction: + compile_code() + +# ✅ REQUIRED - LLM-driven decisions +response = llm.chat(messages, tools=tools) +execute(response.tool_calls) +``` + +### 2. Single Code Path + +Every task, regardless of complexity or domain, flows through the same agent loop: + +```mermaid +graph LR + A[Receive Instruction] --> B[Build Context] + B --> C[LLM Decides] + C --> D[Execute Tools] + D --> E{Complete?} + E -->|No| C + E -->|Yes| F[Verify & Return] +``` + +### 3. Iterative Execution + +BaseAgent never tries to solve everything in one shot. Instead, it: +- Observes the current state +- Thinks about the next step +- Acts by calling tools +- Repeats until the task is complete + +### 4. Self-Verification + +Before declaring a task complete, the agent automatically: +1. Re-reads the original instruction +2. Lists all requirements (explicit and implicit) +3. Verifies each requirement with actual commands +4. Only completes if all verifications pass + +--- + +## High-Level Architecture + +```mermaid +graph TB + subgraph Interface["User Interface"] + CLI["python agent.py --instruction '...'"] + end + + subgraph Engine["Core Engine"] + direction TB + Loop["Agent Loop
(src/core/loop.py)"] + Context["Context Manager
(src/core/compaction.py)"] + Prompt["System Prompt
(src/prompts/system.py)"] + end + + subgraph LLM["LLM Layer"] + Client["LiteLLM Client
(src/llm/client.py)"] + API["Provider API
(Chutes/OpenRouter)"] + end + + subgraph Tools["Tool System"] + Registry["Tool Registry"] + Exec["Execution Engine"] + end + + CLI --> Loop + Loop --> Context + Loop --> Prompt + Loop --> Client + Client --> API + Loop --> Registry + Registry --> Exec + + style Loop fill:#4CAF50,color:#fff + style Client fill:#2196F3,color:#fff +``` + +--- + +## Key Features + +### Autonomous Operation + +BaseAgent runs in **fully autonomous mode**: +- No user confirmations required +- Makes reasonable decisions when faced with ambiguity +- Handles errors by trying alternative approaches +- Never asks questions - just executes + +### Prompt Caching + +Achieves **90%+ cache hit rate** using Anthropic's prompt caching: +- System prompt cached for stability +- Last 2 messages cached to extend prefix +- Reduces API costs by 90% + +### Context Management + +Intelligent memory management for long tasks: +- Token-based overflow detection +- Tool output pruning (protects recent outputs) +- AI-powered compaction when needed +- Middle-out truncation for large outputs + +### Comprehensive Tooling + +Eight specialized tools for coding tasks: + +| Tool | Purpose | +|------|---------| +| `shell_command` | Execute shell commands | +| `read_file` | Read files with line numbers | +| `write_file` | Create or overwrite files | +| `apply_patch` | Surgical file modifications | +| `grep_files` | Fast file content search | +| `list_dir` | Directory exploration | +| `view_image` | Image analysis | +| `update_plan` | Progress tracking | + +--- + +## Workflow Overview + +```mermaid +sequenceDiagram + participant User + participant CLI as agent.py + participant Loop as Agent Loop + participant LLM as LLM (Chutes/OpenRouter) + participant Tools as Tool Registry + + User->>CLI: python agent.py --instruction "..." + CLI->>Loop: Initialize session + + loop Until task complete + Loop->>Loop: Manage context (prune/compact) + Loop->>Loop: Apply prompt caching + Loop->>LLM: Send messages + tools + LLM-->>Loop: Response (text + tool_calls) + + alt Has tool calls + Loop->>Tools: Execute tool calls + Tools-->>Loop: Tool results + else No tool calls + Loop->>Loop: Self-verification check + end + end + + Loop-->>CLI: Task complete + CLI-->>User: JSONL output +``` + +--- + +## What Makes BaseAgent a "Generalist"? + +| Characteristic | Description | +|----------------|-------------| +| **Single code path** | Same logic handles ALL tasks | +| **LLM-driven decisions** | LLM chooses actions, not if-statements | +| **No task keywords** | Zero references to specific task content | +| **Iterative execution** | Observe → Think → Act loop | + +### The Generalist Test + +Ask yourself: *"Would this code behave differently if I changed the task instruction?"* + +If **YES** and it's not because of LLM reasoning → it's hardcoding → **FORBIDDEN** + +--- + +## Design Philosophy + +BaseAgent is built on these principles: + +1. **Explore First** - Always gather context before acting +2. **Iterate** - Never try to do everything in one shot +3. **Verify** - Double-confirm before completing +4. **Fail Gracefully** - Handle errors and retry +5. **Stay Focused** - Complete the task, nothing more + +--- + +## Next Steps + +- [Installation Guide](./installation.md) - Set up BaseAgent +- [Quick Start](./quickstart.md) - Run your first task +- [Architecture](./architecture.md) - Deep dive into the system design diff --git a/docs/quickstart.md b/docs/quickstart.md new file mode 100644 index 0000000..f8a9326 --- /dev/null +++ b/docs/quickstart.md @@ -0,0 +1,242 @@ +# Quick Start Guide + +> **Get BaseAgent running in 5 minutes** + +## Prerequisites + +Before starting, ensure you have: +- Python 3.9+ installed +- An LLM API key (Chutes, OpenRouter, or Anthropic) +- BaseAgent installed (see [Installation](./installation.md)) + +--- + +## Step 1: Set Up Your API Key + +Choose your provider and set the environment variable: + +```bash +# For Chutes AI (recommended) +export CHUTES_API_TOKEN="your-token-from-chutes.ai" + +# OR for OpenRouter +export OPENROUTER_API_KEY="sk-or-v1-..." +``` + +--- + +## Step 2: Run Your First Task + +Navigate to the BaseAgent directory and run: + +```bash +python3 agent.py --instruction "Create a file called hello.txt with the content 'Hello, World!'" +``` + +### Expected Output + +You'll see JSONL events as the agent works: + +```json +{"type": "thread.started", "thread_id": "sess_1234567890"} +{"type": "turn.started"} +{"type": "item.started", "item": {"type": "command_execution", "command": "write_file"}} +{"type": "item.completed", "item": {"type": "command_execution", "status": "completed"}} +{"type": "turn.completed", "usage": {"input_tokens": 5000, "output_tokens": 200}} +``` + +And the file `hello.txt` will be created: + +```bash +cat hello.txt +# Output: Hello, World! +``` + +--- + +## Step 3: Try More Examples + +### Example: Explore a Codebase + +```bash +python3 agent.py --instruction "Explore this repository and describe its structure" +``` + +### Example: Find and Read Files + +```bash +python3 agent.py --instruction "Find all Python files and show me the main entry point" +``` + +### Example: Create a Simple Script + +```bash +python3 agent.py --instruction "Create a Python script that prints the Fibonacci sequence up to 100" +``` + +### Example: Modify Existing Code + +```bash +python3 agent.py --instruction "Add a docstring to all functions in src/core/loop.py" +``` + +--- + +## Understanding the Output + +BaseAgent emits JSONL (JSON Lines) format for machine-readable output: + +```mermaid +sequenceDiagram + participant User + participant Agent + participant stdout as Output + + User->>Agent: --instruction "..." + Agent->>stdout: {"type": "thread.started", ...} + Agent->>stdout: {"type": "turn.started"} + + loop Tool Execution + Agent->>stdout: {"type": "item.started", ...} + Agent->>stdout: {"type": "item.completed", ...} + end + + Agent->>stdout: {"type": "turn.completed", "usage": {...}} +``` + +### Key Event Types + +| Event | Description | +|-------|-------------| +| `thread.started` | Session begins with unique ID | +| `turn.started` | Agent begins processing | +| `item.started` | Tool execution begins | +| `item.completed` | Tool execution finished | +| `turn.completed` | Agent finished with usage stats | +| `turn.failed` | Error occurred | + +--- + +## Quick Command Reference + +```bash +# Basic usage +python3 agent.py --instruction "Your task description" + +# With environment variables inline +CHUTES_API_TOKEN="..." python3 agent.py --instruction "..." + +# Redirect output to file +python3 agent.py --instruction "..." > output.jsonl 2>&1 +``` + +--- + +## Agent Workflow + +Here's what happens when you run a task: + +```mermaid +flowchart TB + subgraph Input + Cmd["python3 agent.py --instruction '...'"] + end + + subgraph Init["Initialization"] + Parse[Parse Arguments] + Config[Load Configuration] + LLM[Initialize LLM Client] + Tools[Register Tools] + end + + subgraph Loop["Agent Loop"] + Context[Manage Context] + Cache[Apply Caching] + Call[Call LLM] + Execute[Execute Tools] + Verify[Self-Verify] + end + + subgraph Output + JSONL[Emit JSONL Events] + Done[Task Complete] + end + + Cmd --> Parse --> Config --> LLM --> Tools + Tools --> Context --> Cache --> Call + Call --> Execute --> Context + Execute --> Verify --> Done + Context & Call & Execute --> JSONL +``` + +--- + +## Tips for Effective Instructions + +### Be Specific + +```bash +# ❌ Too vague +python3 agent.py --instruction "Fix the bug" + +# ✅ Specific +python3 agent.py --instruction "Fix the TypeError in src/utils.py line 42 where x is None" +``` + +### Provide Context + +```bash +# ❌ Missing context +python3 agent.py --instruction "Add tests" + +# ✅ With context +python3 agent.py --instruction "Add unit tests for the calculate_total function in src/billing.py" +``` + +### Request Verification + +```bash +# ✅ Ask for verification +python3 agent.py --instruction "Create a Python script for sorting and verify it works with sample data" +``` + +--- + +## Troubleshooting + +### Agent Not Finding Files + +The agent starts in the current directory. Ensure you're in the right location: + +```bash +pwd # Check current directory +ls # List files +cd /path/to/project +python3 /path/to/baseagent/agent.py --instruction "..." +``` + +### API Rate Limits + +If you hit rate limits, the agent will automatically retry with exponential backoff. You can also: + +```bash +# Set a cost limit +export LLM_COST_LIMIT="5.0" +``` + +### Long-Running Tasks + +For complex tasks, the agent may iterate many times. Monitor progress through the JSONL output: + +```bash +python3 agent.py --instruction "..." 2>&1 | grep "item.completed" +``` + +--- + +## Next Steps + +- [Usage Guide](./usage.md) - Detailed command-line options +- [Configuration](./configuration.md) - Customize behavior +- [Tools Reference](./tools.md) - Available tools +- [Best Practices](./best-practices.md) - Optimization tips diff --git a/docs/tools.md b/docs/tools.md new file mode 100644 index 0000000..78cd143 --- /dev/null +++ b/docs/tools.md @@ -0,0 +1,509 @@ +# Tools Reference + +> **Complete documentation for all available tools in BaseAgent** + +## Overview + +BaseAgent provides eight specialized tools for autonomous task execution. Each tool is designed for a specific purpose and follows consistent patterns for input and output. + +--- + +## Tool Summary + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `shell_command` | Execute shell commands | `command`, `workdir`, `timeout_ms` | +| `read_file` | Read file contents | `file_path`, `offset`, `limit` | +| `write_file` | Create/overwrite files | `file_path`, `content` | +| `apply_patch` | Surgical file edits | `patch` | +| `grep_files` | Search file contents | `pattern`, `include`, `path` | +| `list_dir` | List directory contents | `dir_path`, `depth`, `limit` | +| `view_image` | Analyze images | `path` | +| `update_plan` | Track progress | `steps`, `explanation` | + +--- + +## Tool Architecture + +```mermaid +graph TB + subgraph Registry["Tool Registry (registry.py)"] + Lookup["Tool Lookup"] + Execute["Execution Engine"] + Truncate["Output Truncation"] + end + + subgraph Tools["Tool Implementations"] + Shell["shell_command"] + Read["read_file"] + Write["write_file"] + Patch["apply_patch"] + Grep["grep_files"] + List["list_dir"] + Image["view_image"] + Plan["update_plan"] + end + + subgraph Output["Results"] + Success["ToolResult(success=True)"] + Failure["ToolResult(success=False)"] + end + + Lookup --> Shell & Read & Write & Patch & Grep & List & Image & Plan + Shell & Read & Write & Patch & Grep & List & Image & Plan --> Execute + Execute --> Truncate + Truncate --> Success & Failure +``` + +--- + +## shell_command + +Execute shell commands in the terminal. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `command` | string | Yes | - | Shell command to execute | +| `workdir` | string | No | Current dir | Working directory | +| `timeout_ms` | number | No | 60000 | Timeout in milliseconds | + +### Example Usage + +```json +{ + "name": "shell_command", + "arguments": { + "command": "ls -la", + "workdir": "/workspace", + "timeout_ms": 30000 + } +} +``` + +### Best Practices + +- Always set `workdir` to avoid directory confusion +- Use `rg` (ripgrep) instead of `grep` for faster searches +- Set appropriate timeouts for long-running commands +- Prefer specific commands over `cd && command` + +### Output Format + +``` +total 40 +drwxr-xr-x 7 root root 4096 Feb 3 13:16 . +drwxr-xr-x 1 root root 4096 Feb 3 12:00 .. +-rw-r--r-- 1 root root 5432 Feb 3 13:16 agent.py +drwxr-xr-x 4 root root 4096 Feb 3 13:16 src +``` + +--- + +## read_file + +Read file contents with line numbers. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `file_path` | string | Yes | - | Path to the file | +| `offset` | number | No | 1 | Starting line (1-indexed) | +| `limit` | number | No | 2000 | Maximum lines to return | + +### Example Usage + +```json +{ + "name": "read_file", + "arguments": { + "file_path": "src/core/loop.py", + "offset": 1, + "limit": 100 + } +} +``` + +### Output Format + +``` +L1: """ +L2: Main agent loop - the heart of the SuperAgent system. +L3: """ +L4: +L5: from __future__ import annotations +L6: import time +``` + +### Best Practices + +- Use `offset` and `limit` for large files +- Prefer `grep_files` to find specific content first +- Read relevant sections, not entire large files + +--- + +## write_file + +Create or overwrite a file. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `file_path` | string | Yes | - | Path to the file | +| `content` | string | Yes | - | Content to write | + +### Example Usage + +```json +{ + "name": "write_file", + "arguments": { + "file_path": "hello.txt", + "content": "Hello, World!\n" + } +} +``` + +### Best Practices + +- Use for new files or complete rewrites +- Prefer `apply_patch` for surgical edits +- Parent directories are created automatically +- Include trailing newlines for proper file endings + +--- + +## apply_patch + +Apply surgical file modifications using patch format. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `patch` | string | Yes | - | Patch content | + +### Patch Format + +``` +*** Begin Patch +*** Add File: path/to/new/file.py ++line 1 ++line 2 +*** Update File: path/to/existing/file.py +@@ def existing_function(): +- old_line ++ new_line +*** Delete File: path/to/delete.py +*** End Patch +``` + +### Example Usage + +```json +{ + "name": "apply_patch", + "arguments": { + "patch": "*** Begin Patch\n*** Update File: src/utils.py\n@@ def calculate(x):\n- return x\n+ return x * 2\n*** End Patch" + } +} +``` + +### Patch Rules + +1. Use `@@ context line` to identify location +2. Prefix new lines with `+` +3. Prefix removed lines with `-` +4. Include 3 lines of context before and after changes +5. File paths must be relative (never absolute) + +### Operations + +| Operation | Format | Description | +|-----------|--------|-------------| +| Add file | `*** Add File: path` | Create new file | +| Update file | `*** Update File: path` | Modify existing file | +| Delete file | `*** Delete File: path` | Remove file | + +--- + +## grep_files + +Search file contents using patterns. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `pattern` | string | Yes | - | Regex pattern to search | +| `include` | string | No | - | Glob filter (e.g., `*.py`) | +| `path` | string | No | Current dir | Search path | +| `limit` | number | No | 100 | Max files to return | + +### Example Usage + +```json +{ + "name": "grep_files", + "arguments": { + "pattern": "def.*token", + "include": "*.py", + "path": "src/", + "limit": 50 + } +} +``` + +### Output Format + +``` +src/llm/client.py +src/core/compaction.py +src/utils/truncate.py +``` + +### Best Practices + +- Use ripgrep regex syntax +- Filter with `include` for faster searches +- Search specific directories when possible +- Results sorted by modification time + +--- + +## list_dir + +List directory contents with type indicators. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `dir_path` | string | Yes | - | Directory path | +| `offset` | number | No | 1 | Starting entry (1-indexed) | +| `limit` | number | No | 50 | Max entries to return | +| `depth` | number | No | 2 | Max directory depth | + +### Example Usage + +```json +{ + "name": "list_dir", + "arguments": { + "dir_path": "src/", + "depth": 3, + "limit": 100 + } +} +``` + +### Output Format + +``` +src/ + core/ + loop.py + compaction.py + llm/ + client.py + tools/ + shell.py + read_file.py +``` + +### Type Indicators + +| Indicator | Meaning | +|-----------|---------| +| `/` | Directory | +| `@` | Symbolic link | +| (none) | Regular file | + +--- + +## view_image + +Load and analyze an image from the filesystem. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `path` | string | Yes | - | Path to image file | + +### Supported Formats + +- PNG +- JPEG +- GIF +- WebP +- BMP + +### Example Usage + +```json +{ + "name": "view_image", + "arguments": { + "path": "screenshots/error.png" + } +} +``` + +### How It Works + +```mermaid +sequenceDiagram + participant Agent + participant Tool as view_image + participant LLM as LLM API + + Agent->>Tool: view_image(path) + Tool->>Tool: Load image file + Tool->>Tool: Encode as base64 + Tool-->>Agent: ToolResult with inject_content + Agent->>Agent: Add image to messages + Agent->>LLM: Messages with image content + LLM-->>Agent: Analysis response +``` + +### Best Practices + +- Only use for images the user mentioned +- Don't use if image is already in conversation +- Large images are automatically resized +- Count as ~1000 tokens in context + +--- + +## update_plan + +Track task progress with a visible plan. + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `steps` | array | Yes | - | List of step objects | +| `explanation` | string | No | - | Why the plan changed | + +### Step Object + +```json +{ + "description": "Create helper functions", + "status": "completed" +} +``` + +### Status Values + +| Status | Description | +|--------|-------------| +| `pending` | Not started | +| `in_progress` | Currently working | +| `completed` | Finished | + +### Example Usage + +```json +{ + "name": "update_plan", + "arguments": { + "steps": [ + {"description": "Read existing code", "status": "completed"}, + {"description": "Create helper module", "status": "in_progress"}, + {"description": "Write unit tests", "status": "pending"}, + {"description": "Update documentation", "status": "pending"} + ], + "explanation": "Starting implementation after code review" + } +} +``` + +### Best Practices + +- Keep descriptions to 5-7 words +- Mark steps completed as you go +- Update plan when approach changes +- Use for complex multi-step tasks + +--- + +## Tool Output Limits + +All tool outputs are truncated to prevent context overflow: + +| Setting | Default | Description | +|---------|---------|-------------| +| `max_output_tokens` | 2500 | Maximum tokens per tool output | +| Truncation strategy | Middle-out | Keeps start and end, removes middle | + +### Middle-Out Truncation + +```mermaid +graph LR + subgraph Original["Original Output (10K tokens)"] + Start["First 1250 tokens"] + Middle["Middle section
(removed)"] + End["Last 1250 tokens"] + end + + subgraph Truncated["Truncated Output (2500 tokens)"] + TStart["First 1250 tokens"] + Marker["[...truncated...]"] + TEnd["Last 1250 tokens"] + end + + Start --> TStart + End --> TEnd +``` + +**Why middle-out?** +- Start contains headers, definitions +- End contains results, errors +- Middle is often repetitive + +--- + +## Tool Execution Flow + +```mermaid +flowchart TB + subgraph Request["LLM Request"] + ToolCall["tool_call: {name, arguments}"] + end + + subgraph Registry["Tool Registry"] + Lookup["Lookup Tool"] + Validate["Validate Arguments"] + Execute["Execute Tool"] + end + + subgraph Processing["Post-Processing"] + Truncate["Truncate Output"] + Format["Format Result"] + end + + subgraph Response["Tool Result"] + Success["success: true/false"] + Output["output: string"] + Inject["inject_content (images)"] + end + + ToolCall --> Lookup --> Validate --> Execute + Execute --> Truncate --> Format + Format --> Success & Output & Inject +``` + +--- + +## Next Steps + +- [Usage Guide](./usage.md) - How to use the agent +- [Context Management](./context-management.md) - Memory optimization +- [Best Practices](./best-practices.md) - Effective tool usage diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 0000000..d234c54 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,341 @@ +# Agent Usage Guide + +> **Complete guide to running BaseAgent and interpreting its output** + +## Command-Line Interface + +### Basic Syntax + +```bash +python3 agent.py --instruction "Your task description" +``` + +### Required Arguments + +| Argument | Type | Description | +|----------|------|-------------| +| `--instruction` | string | The task for the agent to complete | + +--- + +## Running the Agent + +### Simple Tasks + +```bash +# Create a file +python3 agent.py --instruction "Create a file called hello.txt with 'Hello, World!'" + +# Read and explain code +python3 agent.py --instruction "Read src/core/loop.py and explain what it does" + +# Find files +python3 agent.py --instruction "Find all Python files that contain 'import json'" +``` + +### Complex Tasks + +```bash +# Multi-step task +python3 agent.py --instruction "Create a Python module in src/utils/helpers.py with functions for string manipulation, then write tests for it" + +# Code modification +python3 agent.py --instruction "Add error handling to all functions in src/api/client.py that make HTTP requests" + +# Investigation task +python3 agent.py --instruction "Find the bug causing the TypeError in the test output and fix it" +``` + +--- + +## Environment Variables + +Configure the agent's behavior with environment variables: + +```bash +# LLM Provider (Chutes) +export CHUTES_API_TOKEN="your-token" +export LLM_PROVIDER="chutes" +export LLM_MODEL="moonshotai/Kimi-K2.5-TEE" + +# LLM Provider (OpenRouter) +export OPENROUTER_API_KEY="sk-or-v1-..." +export LLM_MODEL="openrouter/anthropic/claude-sonnet-4-20250514" + +# Cost management +export LLM_COST_LIMIT="10.0" + +# Run with inline variables +LLM_COST_LIMIT="5.0" python3 agent.py --instruction "..." +``` + +--- + +## Output Format + +BaseAgent emits JSONL (JSON Lines) events to stdout: + +```mermaid +sequenceDiagram + participant Agent + participant stdout as Standard Output + + Agent->>stdout: {"type": "thread.started", "thread_id": "sess_..."} + Agent->>stdout: {"type": "turn.started"} + + loop Tool Execution + Agent->>stdout: {"type": "item.started", "item": {...}} + Agent->>stdout: {"type": "item.completed", "item": {...}} + end + + Agent->>stdout: {"type": "turn.completed", "usage": {...}} +``` + +### Event Types + +| Event | Description | +|-------|-------------| +| `thread.started` | Session begins, includes unique thread ID | +| `turn.started` | Agent begins processing the instruction | +| `item.started` | A tool call is starting | +| `item.completed` | A tool call has completed | +| `turn.completed` | Agent finished, includes token usage | +| `turn.failed` | An error occurred | + +### Example Output + +```json +{"type": "thread.started", "thread_id": "sess_1706890123456"} +{"type": "turn.started"} +{"type": "item.started", "item": {"type": "command_execution", "id": "1", "command": "shell_command({command: 'ls -la'})", "status": "in_progress"}} +{"type": "item.completed", "item": {"type": "command_execution", "id": "1", "command": "shell_command", "status": "completed", "aggregated_output": "total 40\ndrwxr-xr-x...", "exit_code": 0}} +{"type": "item.completed", "item": {"type": "agent_message", "id": "2", "content": "I found the files. Now creating hello.txt..."}} +{"type": "item.started", "item": {"type": "command_execution", "id": "3", "command": "write_file({file_path: 'hello.txt', content: 'Hello, World!'})", "status": "in_progress"}} +{"type": "item.completed", "item": {"type": "command_execution", "id": "3", "command": "write_file", "status": "completed", "exit_code": 0}} +{"type": "turn.completed", "usage": {"input_tokens": 5432, "cached_input_tokens": 4890, "output_tokens": 256}} +``` + +--- + +## Logging Output + +Agent logs go to stderr: + +``` +[14:30:15] [superagent] ============================================================ +[14:30:15] [superagent] SuperAgent Starting (SDK 3.0 - litellm) +[14:30:15] [superagent] ============================================================ +[14:30:15] [superagent] Model: openrouter/anthropic/claude-sonnet-4-20250514 +[14:30:15] [superagent] Instruction: Create hello.txt with 'Hello World'... +[14:30:15] [loop] Getting initial state... +[14:30:16] [loop] Iteration 1/200 +[14:30:16] [compaction] Context: 5432 tokens (3.2% of 168000) +[14:30:16] [loop] Prompt caching: 1 system + 2 final messages marked (3 breakpoints) +[14:30:17] [loop] Executing tool: write_file +[14:30:17] [loop] Iteration 2/200 +[14:30:18] [loop] No tool calls in response +[14:30:18] [loop] Requesting self-verification before completion +``` + +### Separating Output Streams + +```bash +# Send JSONL to file, logs to terminal +python3 agent.py --instruction "..." > output.jsonl + +# Send logs to file, JSONL to terminal +python3 agent.py --instruction "..." 2> agent.log + +# Both to separate files +python3 agent.py --instruction "..." > output.jsonl 2> agent.log +``` + +--- + +## Processing Output + +### Parse JSONL with jq + +```bash +# Get all completed items +python3 agent.py --instruction "..." | jq 'select(.type == "item.completed")' + +# Get final usage stats +python3 agent.py --instruction "..." | jq 'select(.type == "turn.completed") | .usage' + +# Get all agent messages +python3 agent.py --instruction "..." | jq 'select(.item.type == "agent_message") | .item.content' +``` + +### Parse with Python + +```python +import json +import subprocess + +# Run agent and capture output +result = subprocess.run( + ["python3", "agent.py", "--instruction", "Your task"], + capture_output=True, + text=True +) + +# Parse JSONL output +events = [json.loads(line) for line in result.stdout.strip().split('\n') if line] + +# Find usage stats +for event in events: + if event.get("type") == "turn.completed": + print(f"Input tokens: {event['usage']['input_tokens']}") + print(f"Output tokens: {event['usage']['output_tokens']}") +``` + +--- + +## Agent Workflow + +```mermaid +flowchart TB + subgraph Input["Input Phase"] + Cmd["python3 agent.py --instruction '...'"] + Parse["Parse Arguments"] + Init["Initialize Components"] + end + + subgraph Explore["Exploration Phase"] + State["Get Current State"] + Context["Build Initial Context"] + end + + subgraph Execute["Execution Phase"] + Loop["Agent Loop"] + Tools["Execute Tools"] + Verify["Self-Verification"] + end + + subgraph Output["Output Phase"] + JSONL["Emit JSONL Events"] + Stats["Report Statistics"] + end + + Cmd --> Parse --> Init + Init --> State --> Context + Context --> Loop + Loop --> Tools --> Loop + Loop --> Verify + Verify --> Stats + Loop --> JSONL +``` + +--- + +## Example Tasks + +### File Operations + +```bash +# Create a file +python3 agent.py --instruction "Create config.yaml with database settings for PostgreSQL" + +# Read and summarize +python3 agent.py --instruction "Read README.md and create a one-paragraph summary" + +# Modify a file +python3 agent.py --instruction "Add a new function to src/utils.py that validates email addresses" +``` + +### Code Analysis + +```bash +# Explain code +python3 agent.py --instruction "Explain how the authentication system works in src/auth/" + +# Find patterns +python3 agent.py --instruction "Find all API endpoints and list them with their HTTP methods" + +# Review code +python3 agent.py --instruction "Review src/api/handlers.py for potential security issues" +``` + +### Debugging + +```bash +# Investigate error +python3 agent.py --instruction "Find why 'test_user_creation' is failing and fix it" + +# Trace behavior +python3 agent.py --instruction "Trace the data flow from user input to database in the signup process" +``` + +### Project Tasks + +```bash +# Setup +python3 agent.py --instruction "Create a Python project structure with src/, tests/, and setup.py" + +# Add feature +python3 agent.py --instruction "Add logging to all functions in src/core/ using Python's logging module" + +# Refactor +python3 agent.py --instruction "Refactor src/utils.py to follow the single responsibility principle" +``` + +--- + +## Session Management + +Each agent run creates a new session with a unique ID: + +```json +{"type": "thread.started", "thread_id": "sess_1706890123456"} +``` + +### Session Lifecycle + +```mermaid +stateDiagram-v2 + [*] --> Initializing: python3 agent.py + Initializing --> Running: thread.started + Running --> Iterating: turn.started + Iterating --> Executing: item.started + Executing --> Iterating: item.completed + Iterating --> Verifying: No tool calls + Verifying --> Iterating: Needs more work + Verifying --> Complete: Verified + Iterating --> Failed: Error + Complete --> [*]: turn.completed + Failed --> [*]: turn.failed +``` + +--- + +## Performance Tips + +### Optimize Token Usage + +```bash +# Set lower cost limit for testing +export LLM_COST_LIMIT="2.0" +``` + +### Monitor Progress + +```bash +# Watch tool executions in real-time +python3 agent.py --instruction "..." 2>&1 | grep -E "Executing tool|Iteration" +``` + +### Debug Issues + +```bash +# Full verbose output +python3 agent.py --instruction "..." 2>&1 | tee agent_debug.log +``` + +--- + +## Next Steps + +- [Tools Reference](./tools.md) - Available tools and their parameters +- [Configuration](./configuration.md) - Customize agent behavior +- [Best Practices](./best-practices.md) - Tips for effective usage