A reasoning ledger for AI agents. Thoughtbox is an MCP server that provides structured reasoning tools, enabling agents to think step-by-step, branch into alternative explorations, revise earlier conclusions, and maintain a persistent record of their cognitive process.
Unlike ephemeral chain-of-thought prompting, Thoughtbox creates a durable reasoning chain — a ledger of thoughts that can be visualized, exported, and analyzed. Each thought is a node in a graph structure supporting forward thinking, backward planning, branching explorations, mid-course revisions, and autonomous critique via MCP sampling.
Local-First: Thoughtbox runs entirely on your machine. All reasoning data is stored locally at ~/.thoughtbox/ by default — nothing is sent to external servers. Your thought processes remain private and under your control.
Note: Thoughtbox is currently optimized for use with Claude Code. We are actively working on supporting additional MCP clients, but due to the wide variation in capability support across the Model Context Protocol ecosystem — server features (prompts, resources, tools), client features (roots, sampling, elicitation), and behaviors like
listChangednotifications — we're having to implement custom adaptations for many clients. See the gateway tool for one such adaptation.
If you're using a client other than Claude Code and encounter issues, please open an issue describing your client and the problem — this helps us prioritize compatibility work.
Thoughtbox uses a staged tool disclosure system to guide agents through proper initialization:
| Stage | Tools Available | Trigger |
|---|---|---|
| Stage 0 | init, thoughtbox_gateway |
Connection start |
| Stage 1 | + thoughtbox_cipher, session |
init(start_new) or init(load_context) |
| Stage 2 | + thoughtbox, notebook |
thoughtbox_cipher call |
| Stage 3 | + mental_models, export_reasoning_chain |
Domain activation |
This ensures agents establish proper session context before accessing advanced reasoning tools.
The thoughtbox_gateway tool is always enabled and provides a routing mechanism for clients that don't refresh tool lists mid-turn (e.g., Claude Code over streaming HTTP). It routes to all handlers internally while enforcing stage requirements:
// Call operations through gateway without waiting for tool list refresh
{ operation: "get_state" } // → init handler
{ operation: "start_new", args: {...} } // → init handler, advances to Stage 1
{ operation: "cipher" } // → cipher content, advances to Stage 2
{ operation: "thought", args: {...} } // → thoughtbox handler (requires Stage 2)The gateway returns clear error messages when operations are called at the wrong stage.
Observatory UI showing a reasoning session with 14 thoughts and a branch exploration (purple nodes 13-14) forking from thought 5.
Thoughtbox treats reasoning as data, not just process. Every thought is:
- Numbered: Logical position in the reasoning chain (supports non-sequential addition)
- Timestamped: When the thought was recorded
- Linked: Connected to previous thoughts, branch origins, or revised thoughts
- Persistent: Stored in sessions that survive across conversations
- Exportable: Full reasoning chains can be exported as JSON or Markdown
This creates an auditable trail of how conclusions were reached — invaluable for debugging agent behavior, understanding decision-making, and improving reasoning strategies.
Thoughtbox supports multiple reasoning strategies out of the box:
| Pattern | Description | Use Case |
|---|---|---|
| Forward Thinking | Sequential 1→2→3→N progression | Exploration, discovery, open-ended analysis |
| Backward Thinking | Start at goal (N), work back to start (1) | Planning, system design, working from known goals |
| Branching | Fork into parallel explorations (A, B, C...) | Comparing alternatives, A/B scenarios |
| Revision | Update earlier thoughts with new information | Error correction, refined understanding |
| Interleaved | Alternate reasoning with tool actions | Tool-oriented tasks, adaptive execution |
See the Patterns Cookbook for comprehensive examples.
The entry point for establishing session context.
Operations:
get_state— Check current session statelist_sessions— Browse available sessions (filter by project, task, aspect)start_new— Begin a new work session with project/task/aspect scopingload_context— Resume an existing sessionnavigate— Move between project/task/aspect levelslist_roots— Query MCP roots from clientbind_root— Bind a root directory as project scope
A priming tool that prepares agents for structured reasoning. Calling this tool unlocks Stage 2 tools.
Manage reasoning sessions with full persistence.
Operations:
list— List all sessionsget— Retrieve session detailsexport— Export session as JSON or Markdownanalyze— Get session statistics and insights
The core tool for step-by-step thinking with full graph capabilities.
// Simple forward thinking
{ thought: "First, let's identify the problem...", thoughtNumber: 1, totalThoughts: 5, nextThoughtNeeded: true }
// Branching to explore alternatives
{ thought: "Option A: Use PostgreSQL...", thoughtNumber: 3, branchFromThought: 2, branchId: "sql-path", ... }
// Revising an earlier conclusion
{ thought: "Actually, the root cause is...", thoughtNumber: 7, isRevision: true, revisesThought: 3, ... }
// Request autonomous critique via MCP sampling
{ thought: "This approach seems optimal...", thoughtNumber: 5, critique: true, ... }Parameters:
thought(string): The current thinking stepthoughtNumber(integer): Logical position in the chaintotalThoughts(integer): Estimated total (adjustable)nextThoughtNeeded(boolean): Continue thinking?branchFromThought(integer): Fork point for branchingbranchId(string): Branch identifierisRevision(boolean): Marks revision of earlier thoughtrevisesThought(integer): Which thought is being revisedcritique(boolean): Request autonomous LLM critique via MCP sampling API
Autonomous Critique: When critique: true, the server uses the MCP sampling/createMessage API to request an external LLM analysis of the current thought. The critique is returned in the response and persisted with the thought.
A built-in web UI for watching reasoning unfold in real-time.
Features:
- Live Graph: Watch thoughts appear as they're added
- Snake Layout: Compact left-to-right flow with row wrapping
- Hierarchical Branches: Branches collapse into clickable stub nodes (A, B, C...)
- Navigation: Click stubs to explore branches, back button to return
- Detail Panel: Click any node to view full thought content
- Multi-Session: Switch between active reasoning sessions
Access: The Observatory is available at http://localhost:1729 when the server is running.
15 structured mental models that provide process scaffolds for different problem types.
Available Models:
rubber-duck— Explain to clarify thinkingfive-whys— Root cause analysispre-mortem— Anticipate failuressteelmanning— Strengthen opposing argumentsfermi-estimation— Order-of-magnitude reasoningtrade-off-matrix— Multi-criteria decisionsdecomposition— Break down complexityinversion— Solve by avoiding failure- And 7 more...
Operations:
get_model— Retrieve a specific model's promptlist_models— List all models (filter by tag)list_tags— Available categories (debugging, planning, decision-making, etc.)
Interactive notebooks combining documentation with executable JavaScript/TypeScript.
Features:
- Isolated execution environments per notebook
- Full package.json support with dependency installation
- Sequential Feynman template for deep learning workflows
- Export to
.src.mdformat
Operations: create, add_cell, run_cell, export, list, load, update_cell
npx -y @kastalien-research/thoughtboxAdd to your ~/.claude/settings.json or project .claude/settings.json:
{
"mcpServers": {
"thoughtbox": {
"command": "npx",
"args": ["-y", "@kastalien-research/thoughtbox"]
}
}
}Add to your MCP settings or .vscode/mcp.json:
{
"servers": {
"thoughtbox": {
"command": "npx",
"args": ["-y", "@kastalien-research/thoughtbox"]
}
}
}Thought 1: "Users report slow checkout. Let's analyze..."
Thought 2: "Data shows 45s average, target is 10s..."
Thought 3: "Root causes: 3 API calls, no caching..."
Thought 4: "Options: Redis cache, query optimization, parallel calls..."
Thought 5: "Recommendation: Implement Redis cache for product data"
Thought 8: [GOAL] "System handles 10k req/s with <100ms latency"
Thought 7: "Before that: monitoring and alerting operational"
Thought 6: "Before that: resilience patterns implemented"
Thought 5: "Before that: caching layer with invalidation"
...
Thought 1: [START] "Current state: 1k req/s, 500ms latency"
Thought 4: "Need to choose database architecture..."
Branch A (thought 5): branchId="sql-path"
"PostgreSQL: ACID compliance, mature tooling, relational integrity"
Branch B (thought 5): branchId="nosql-path"
"MongoDB: Flexible schema, horizontal scaling, document model"
Thought 6: [SYNTHESIS] "Use PostgreSQL for transactions, MongoDB for analytics"
| Variable | Description | Default |
|---|---|---|
DISABLE_THOUGHT_LOGGING |
Suppress thought logging to stderr | false |
THOUGHTBOX_DATA_DIR |
Base directory for persistent storage | ~/.thoughtbox |
THOUGHTBOX_PROJECT |
Project scope for session isolation | _default |
THOUGHTBOX_TRANSPORT |
Transport type (stdio or http) |
http |
# Install dependencies
npm install
# Build
npm run build
# Development with hot reload
npm run dev
# Run the server
npm startA docker-compose.yml is included to run the HTTP MCP server and the Observatory UI together.
docker compose up --build- HTTP MCP + health: http://localhost:1731/health
- Observatory UI/WebSocket: http://localhost:1729/
- OpenTelemetry Collector: ports 4317 (gRPC) and 4318 (HTTP) for Claude Code telemetry
- Persistent data: stored in named volume
thoughtbox-dataat/data/thoughtbox(override with env vars likeTHOUGHTBOX_PROJECTorTHOUGHTBOX_DATA_DIR).
src/
├── index.ts # Entry point (stdio/HTTP transport selection)
├── server-factory.ts # MCP server factory with tool registration
├── tool-registry.ts # Progressive disclosure (stage-based tool enabling)
├── tool-descriptions.ts # Stage-specific tool descriptions
├── thought-handler.ts # Thoughtbox tool logic with critique support
├── gateway/ # Always-on routing tool
│ ├── gateway-handler.ts # Routes to handlers with stage enforcement
│ └── index.ts # Module exports
├── init/ # Init workflow and state management
│ ├── tool-handler.ts # Init tool operations
│ └── state-manager.ts # Session state persistence
├── sessions/ # Session tool handler
├── sampling/ # Autonomous critique via MCP sampling
│ └── handler.ts # SamplingHandler for LLM critique requests
├── persistence/ # Storage layer
│ ├── storage.ts # InMemoryStorage with LinkedThoughtStore
│ └── filesystem-storage.ts # FileSystemStorage with atomic writes
├── observatory/ # Real-time visualization
│ ├── ui/ # Self-contained HTML/CSS/JS
│ ├── ws-server.ts # WebSocket server for live updates
│ └── emitter.ts # Event emission for thought changes
├── mental-models/ # 15 reasoning frameworks
├── notebook/ # Literate programming engine
└── resources/ # Documentation and patterns cookbook
Thoughtbox supports two storage backends:
- InMemoryStorage: Default for development, uses
LinkedThoughtStorefor O(1) thought lookups - FileSystemStorage: Persistent storage with atomic writes and project isolation
Data is stored at ~/.thoughtbox/ by default:
~/.thoughtbox/
├── config.json # Global configuration
└── projects/
└── {project}/
└── sessions/
└── {date}/
└── {session-id}/
├── manifest.json
└── {thought-number}.json
We welcome contributions! See CONTRIBUTING.md for:
- Development setup
- Commit conventions (optimized for
thick_readcode comprehension) - Testing with agentic scripts
- Pull request process
MIT License — free to use, modify, and distribute.