Skip to content

Reasoning ledger and context engine for Claude Code. Thoughtbox allows Claude to learn within and across runs + sessions, reducing the need to spend human brainpower and energy on the same concepts over and over.

License

Notifications You must be signed in to change notification settings

Kastalien-Research/thoughtbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

314 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thoughtbox

A reasoning ledger for AI agents. Thoughtbox is an MCP server that provides structured reasoning tools, enabling agents to think step-by-step, branch into alternative explorations, revise earlier conclusions, and maintain a persistent record of their cognitive process.

Unlike ephemeral chain-of-thought prompting, Thoughtbox creates a durable reasoning chain — a ledger of thoughts that can be visualized, exported, and analyzed. Each thought is a node in a graph structure supporting forward thinking, backward planning, branching explorations, mid-course revisions, and autonomous critique via MCP sampling.

Local-First: Thoughtbox runs entirely on your machine. All reasoning data is stored locally at ~/.thoughtbox/ by default — nothing is sent to external servers. Your thought processes remain private and under your control.

Client Compatibility

Note: Thoughtbox is currently optimized for use with Claude Code. We are actively working on supporting additional MCP clients, but due to the wide variation in capability support across the Model Context Protocol ecosystem — server features (prompts, resources, tools), client features (roots, sampling, elicitation), and behaviors like listChanged notifications — we're having to implement custom adaptations for many clients. See the gateway tool for one such adaptation.

If you're using a client other than Claude Code and encounter issues, please open an issue describing your client and the problem — this helps us prioritize compatibility work.

Progressive Disclosure

Thoughtbox uses a staged tool disclosure system to guide agents through proper initialization:

Stage Tools Available Trigger
Stage 0 init, thoughtbox_gateway Connection start
Stage 1 + thoughtbox_cipher, session init(start_new) or init(load_context)
Stage 2 + thoughtbox, notebook thoughtbox_cipher call
Stage 3 + mental_models, export_reasoning_chain Domain activation

This ensures agents establish proper session context before accessing advanced reasoning tools.

Gateway Tool (Always Available)

The thoughtbox_gateway tool is always enabled and provides a routing mechanism for clients that don't refresh tool lists mid-turn (e.g., Claude Code over streaming HTTP). It routes to all handlers internally while enforcing stage requirements:

// Call operations through gateway without waiting for tool list refresh
{ operation: "get_state" }                    // → init handler
{ operation: "start_new", args: {...} }       // → init handler, advances to Stage 1
{ operation: "cipher" }                       // → cipher content, advances to Stage 2
{ operation: "thought", args: {...} }         // → thoughtbox handler (requires Stage 2)

The gateway returns clear error messages when operations are called at the wrong stage.

Thoughtbox Observatory Observatory UI showing a reasoning session with 14 thoughts and a branch exploration (purple nodes 13-14) forking from thought 5.

Core Concepts

The Reasoning Ledger

Thoughtbox treats reasoning as data, not just process. Every thought is:

  • Numbered: Logical position in the reasoning chain (supports non-sequential addition)
  • Timestamped: When the thought was recorded
  • Linked: Connected to previous thoughts, branch origins, or revised thoughts
  • Persistent: Stored in sessions that survive across conversations
  • Exportable: Full reasoning chains can be exported as JSON or Markdown

This creates an auditable trail of how conclusions were reached — invaluable for debugging agent behavior, understanding decision-making, and improving reasoning strategies.

Thinking Patterns

Thoughtbox supports multiple reasoning strategies out of the box:

Pattern Description Use Case
Forward Thinking Sequential 1→2→3→N progression Exploration, discovery, open-ended analysis
Backward Thinking Start at goal (N), work back to start (1) Planning, system design, working from known goals
Branching Fork into parallel explorations (A, B, C...) Comparing alternatives, A/B scenarios
Revision Update earlier thoughts with new information Error correction, refined understanding
Interleaved Alternate reasoning with tool actions Tool-oriented tasks, adaptive execution

See the Patterns Cookbook for comprehensive examples.

Features

1. Init Tool — Session Management (Stage 0)

The entry point for establishing session context.

Operations:

  • get_state — Check current session state
  • list_sessions — Browse available sessions (filter by project, task, aspect)
  • start_new — Begin a new work session with project/task/aspect scoping
  • load_context — Resume an existing session
  • navigate — Move between project/task/aspect levels
  • list_roots — Query MCP roots from client
  • bind_root — Bind a root directory as project scope

2. Thoughtbox Cipher — Deep Thinking Primer (Stage 1)

A priming tool that prepares agents for structured reasoning. Calling this tool unlocks Stage 2 tools.

3. Session Tool — Persistence & Export (Stage 1)

Manage reasoning sessions with full persistence.

Operations:

  • list — List all sessions
  • get — Retrieve session details
  • export — Export session as JSON or Markdown
  • analyze — Get session statistics and insights

4. Thoughtbox Tool — Structured Reasoning (Stage 2)

The core tool for step-by-step thinking with full graph capabilities.

// Simple forward thinking
{ thought: "First, let's identify the problem...", thoughtNumber: 1, totalThoughts: 5, nextThoughtNeeded: true }

// Branching to explore alternatives
{ thought: "Option A: Use PostgreSQL...", thoughtNumber: 3, branchFromThought: 2, branchId: "sql-path", ... }

// Revising an earlier conclusion
{ thought: "Actually, the root cause is...", thoughtNumber: 7, isRevision: true, revisesThought: 3, ... }

// Request autonomous critique via MCP sampling
{ thought: "This approach seems optimal...", thoughtNumber: 5, critique: true, ... }

Parameters:

  • thought (string): The current thinking step
  • thoughtNumber (integer): Logical position in the chain
  • totalThoughts (integer): Estimated total (adjustable)
  • nextThoughtNeeded (boolean): Continue thinking?
  • branchFromThought (integer): Fork point for branching
  • branchId (string): Branch identifier
  • isRevision (boolean): Marks revision of earlier thought
  • revisesThought (integer): Which thought is being revised
  • critique (boolean): Request autonomous LLM critique via MCP sampling API

Autonomous Critique: When critique: true, the server uses the MCP sampling/createMessage API to request an external LLM analysis of the current thought. The critique is returned in the response and persisted with the thought.

5. Observatory — Real-Time Visualization

A built-in web UI for watching reasoning unfold in real-time.

Features:

  • Live Graph: Watch thoughts appear as they're added
  • Snake Layout: Compact left-to-right flow with row wrapping
  • Hierarchical Branches: Branches collapse into clickable stub nodes (A, B, C...)
  • Navigation: Click stubs to explore branches, back button to return
  • Detail Panel: Click any node to view full thought content
  • Multi-Session: Switch between active reasoning sessions

Access: The Observatory is available at http://localhost:1729 when the server is running.

6. Mental Models — Reasoning Frameworks (Stage 2)

15 structured mental models that provide process scaffolds for different problem types.

Available Models:

  • rubber-duck — Explain to clarify thinking
  • five-whys — Root cause analysis
  • pre-mortem — Anticipate failures
  • steelmanning — Strengthen opposing arguments
  • fermi-estimation — Order-of-magnitude reasoning
  • trade-off-matrix — Multi-criteria decisions
  • decomposition — Break down complexity
  • inversion — Solve by avoiding failure
  • And 7 more...

Operations:

  • get_model — Retrieve a specific model's prompt
  • list_models — List all models (filter by tag)
  • list_tags — Available categories (debugging, planning, decision-making, etc.)

7. Notebook — Literate Programming (Stage 2)

Interactive notebooks combining documentation with executable JavaScript/TypeScript.

Features:

  • Isolated execution environments per notebook
  • Full package.json support with dependency installation
  • Sequential Feynman template for deep learning workflows
  • Export to .src.md format

Operations: create, add_cell, run_cell, export, list, load, update_cell

Installation

Quick Start

npx -y @kastalien-research/thoughtbox

MCP Client Configuration

Claude Code

Add to your ~/.claude/settings.json or project .claude/settings.json:

{
  "mcpServers": {
    "thoughtbox": {
      "command": "npx",
      "args": ["-y", "@kastalien-research/thoughtbox"]
    }
  }
}

Cline / VS Code

Add to your MCP settings or .vscode/mcp.json:

{
  "servers": {
    "thoughtbox": {
      "command": "npx",
      "args": ["-y", "@kastalien-research/thoughtbox"]
    }
  }
}

Usage Examples

Forward Thinking — Problem Analysis

Thought 1: "Users report slow checkout. Let's analyze..."
Thought 2: "Data shows 45s average, target is 10s..."
Thought 3: "Root causes: 3 API calls, no caching..."
Thought 4: "Options: Redis cache, query optimization, parallel calls..."
Thought 5: "Recommendation: Implement Redis cache for product data"

Backward Thinking — System Design

Thought 8: [GOAL] "System handles 10k req/s with <100ms latency"
Thought 7: "Before that: monitoring and alerting operational"
Thought 6: "Before that: resilience patterns implemented"
Thought 5: "Before that: caching layer with invalidation"
...
Thought 1: [START] "Current state: 1k req/s, 500ms latency"

Branching — Comparing Alternatives

Thought 4: "Need to choose database architecture..."

Branch A (thought 5): branchId="sql-path"
  "PostgreSQL: ACID compliance, mature tooling, relational integrity"

Branch B (thought 5): branchId="nosql-path"
  "MongoDB: Flexible schema, horizontal scaling, document model"

Thought 6: [SYNTHESIS] "Use PostgreSQL for transactions, MongoDB for analytics"

Environment Variables

Variable Description Default
DISABLE_THOUGHT_LOGGING Suppress thought logging to stderr false
THOUGHTBOX_DATA_DIR Base directory for persistent storage ~/.thoughtbox
THOUGHTBOX_PROJECT Project scope for session isolation _default
THOUGHTBOX_TRANSPORT Transport type (stdio or http) http

Development

# Install dependencies
npm install

# Build
npm run build

# Development with hot reload
npm run dev

# Run the server
npm start

Docker Compose

A docker-compose.yml is included to run the HTTP MCP server and the Observatory UI together.

docker compose up --build
  • HTTP MCP + health: http://localhost:1731/health
  • Observatory UI/WebSocket: http://localhost:1729/
  • OpenTelemetry Collector: ports 4317 (gRPC) and 4318 (HTTP) for Claude Code telemetry
  • Persistent data: stored in named volume thoughtbox-data at /data/thoughtbox (override with env vars like THOUGHTBOX_PROJECT or THOUGHTBOX_DATA_DIR).

Architecture

src/
├── index.ts              # Entry point (stdio/HTTP transport selection)
├── server-factory.ts     # MCP server factory with tool registration
├── tool-registry.ts      # Progressive disclosure (stage-based tool enabling)
├── tool-descriptions.ts  # Stage-specific tool descriptions
├── thought-handler.ts    # Thoughtbox tool logic with critique support
├── gateway/              # Always-on routing tool
│   ├── gateway-handler.ts  # Routes to handlers with stage enforcement
│   └── index.ts          # Module exports
├── init/                 # Init workflow and state management
│   ├── tool-handler.ts   # Init tool operations
│   └── state-manager.ts  # Session state persistence
├── sessions/             # Session tool handler
├── sampling/             # Autonomous critique via MCP sampling
│   └── handler.ts        # SamplingHandler for LLM critique requests
├── persistence/          # Storage layer
│   ├── storage.ts        # InMemoryStorage with LinkedThoughtStore
│   └── filesystem-storage.ts  # FileSystemStorage with atomic writes
├── observatory/          # Real-time visualization
│   ├── ui/               # Self-contained HTML/CSS/JS
│   ├── ws-server.ts      # WebSocket server for live updates
│   └── emitter.ts        # Event emission for thought changes
├── mental-models/        # 15 reasoning frameworks
├── notebook/             # Literate programming engine
└── resources/            # Documentation and patterns cookbook

Storage

Thoughtbox supports two storage backends:

  • InMemoryStorage: Default for development, uses LinkedThoughtStore for O(1) thought lookups
  • FileSystemStorage: Persistent storage with atomic writes and project isolation

Data is stored at ~/.thoughtbox/ by default:

~/.thoughtbox/
├── config.json           # Global configuration
└── projects/
    └── {project}/
        └── sessions/
            └── {date}/
                └── {session-id}/
                    ├── manifest.json
                    └── {thought-number}.json

Contributing

We welcome contributions! See CONTRIBUTING.md for:

  • Development setup
  • Commit conventions (optimized for thick_read code comprehension)
  • Testing with agentic scripts
  • Pull request process

License

MIT License — free to use, modify, and distribute.

About

Reasoning ledger and context engine for Claude Code. Thoughtbox allows Claude to learn within and across runs + sessions, reducing the need to spend human brainpower and energy on the same concepts over and over.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 6

Languages