Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,31 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.9.0] - 2026-02-27

### Added

- **Dashboard retrieval analytics**: New analytics section on the Overview page surfaces retrieval feedback data that was previously collected but never visualized. Addresses 4 observability gaps identified in the system audit:
- **Tool call frequency**: Horizontal bar chart showing MCP tool usage breakdown (search, recall, predict, etc.)
- **Retrieval volume over time**: Weekly time series of retrieval activity, reusing the existing `TimeSeries` D3 component
- **Top retrieved chunks**: Table of the 10 most-retrieved chunks with project, preview, token count, and retrieval count — surfaces dominant-chunk problems
- **Chunk size distribution**: Vertical bar chart of chunk token-count buckets (0-200, 201-500, 501-1K, 1K-2K, 2K-5K, 5K+) for validating length penalty tuning
- **Per-project retrieval quality**: Projects page now shows Retrievals and Unique Queries columns alongside existing chunk counts
- **Stat cards**: Total Retrievals, Unique Queries, and Top Tool summary cards
- **`ToolUsageChart` component** (`src/dashboard/client/src/components/stats/ToolUsageChart.tsx`): D3 horizontal bar chart for tool usage data
- **`SizeDistribution` component** (`src/dashboard/client/src/components/stats/SizeDistribution.tsx`): D3 vertical bar chart for chunk size buckets

### Changed

- **`GET /api/stats`**: Response now includes an `analytics` object with `toolUsage`, `retrievalTimeSeries`, `topChunks`, `projectRetrievals`, `sizeDistribution`, and `totalRetrievals`. Gracefully returns empty arrays and 0 when no feedback data exists.
- **`GET /api/projects`**: Each project now includes `retrievals` and `uniqueQueries` fields (default 0).
- **SECURITY.md**: Updated supported versions to `>= 0.9.0`.

### Tests

- 4 new route tests: empty analytics, populated analytics with feedback data, zero retrieval counts on projects, per-project retrieval counts.
- 2031 total tests passing.

## [0.8.2] - 2026-02-25

### Fixed
Expand Down
4 changes: 2 additions & 2 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

| Version | Supported |
| ------- | ------------------ |
| >= 0.8.0 | :white_check_mark: |
| < 0.8.0 | :x: |
| >= 0.9.0 | :white_check_mark: |
| < 0.9.0 | :x: |

## Reporting a Vulnerability

Expand Down
18 changes: 18 additions & 0 deletions config.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,24 @@
},
"additionalProperties": false
},
"lengthPenalty": {
"type": "object",
"description": "Length penalty settings to favour focused chunks over large keyword-rich ones",
"properties": {
"enabled": {
"type": "boolean",
"default": true,
"description": "Enable logarithmic length penalty for scoring"
},
"referenceTokens": {
"type": "number",
"minimum": 1,
"default": 500,
"description": "Reference token count. Chunks at this size receive no penalty; larger chunks are penalised logarithmically."
}
},
"additionalProperties": false
},
"recency": {
"type": "object",
"description": "Recency boost settings for time-decay scoring",
Expand Down
6 changes: 4 additions & 2 deletions docs/guides/dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Collection-wide statistics at a glance:
- Graph connectivity metrics
- Per-project breakdown
- Recent ingestion activity
- **Retrieval analytics** (shown when feedback data exists): tool call frequency chart, retrieval volume over time, chunk size distribution, top retrieved chunks table, and summary stat cards (total retrievals, unique queries, top tool)

### Search

Expand Down Expand Up @@ -58,6 +59,7 @@ Per-project views:

- Sessions per project with time ranges
- Chunk distribution across sessions
- Retrieval counts and unique query counts per project
- Project-specific graph statistics

## API Routes
Expand All @@ -66,11 +68,11 @@ The dashboard exposes a REST API that powers the UI. These routes can also be us

| Route | Description |
| --------------------------------------- | -------------------------------------------------- |
| `GET /api/stats` | Collection statistics (chunks, edges, clusters) |
| `GET /api/stats` | Collection statistics and retrieval analytics |
| `GET /api/chunks` | List chunks with pagination |
| `GET /api/edges` | List edges with filtering |
| `GET /api/clusters` | List clusters with member counts |
| `GET /api/projects` | List projects with chunk counts |
| `GET /api/projects` | List projects with chunk and retrieval counts |
| `GET /api/graph` | Graph data for visualization (nodes + edges) |
| `GET /api/graph/neighborhood` | Neighborhood subgraph around a specific chunk |
| `GET /api/search?q=<query>` | Search memory with retrieval pipeline |
Expand Down
44 changes: 44 additions & 0 deletions docs/research/experiments/lessons-learned.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,49 @@ This separation of concerns led to the current architecture:

Each mechanism does what it's best at. The v0.2 architecture tried to make the graph do semantic ranking via sum-product path weights — conflating structural and semantic concerns.

## Transition Matrices at Query Boundaries (v0.8.1)

### What We Tried

Use cluster-level transition matrices (bigram/trigram) from the causal graph to predict which clusters should be returned at retrieval time. The hypothesis: if session A ended in clusters X,Y and session B started in clusters Y,Z, a transition matrix could learn X→Z and Y→Z patterns useful for prediction.

A preliminary scan over the full graph showed 61x lift (45% bigram accuracy vs 0.74% random), suggesting strong signal. We designed a controlled experiment isolating signal at actual query boundaries:

- **Experiment A**: Cross-session prediction — at each cross-session edge, predict the next session's initial clusters from the previous session's final clusters.
- **Experiment B**: Retrieval feedback chain — predict which clusters will be retrieved next based on recent retrieval history.
- **Baselines**: Random, most-popular, recency (predict same clusters), plus global/within-chain/cross-session bigrams, project-conditioned bigram, and trigram.

### What Happened

The preliminary 61x lift was entirely within-session workflow signal:

| Approach | P@5 | Lift@5 |
|----------|-----|--------|
| Random | 3.7% | 1.0x |
| Most popular | 8.4% | 2.3x |
| Recency | 22.1% | 6.0x |
| **Global bigram** | **31.6%** | **8.5x** |
| Within-chain bigram | 31.6% | 8.5x |
| **Cross-session bigram** | **4.2%** | **1.1x** |

The cross-session bigram matrix contained only 3 source clusters and 3 cells — too sparse to learn anything. The global bigram's 8.5x lift was identical to the within-chain bigram, confirming it was entirely driven by within-chain edges (74.7% of forward edges).

### Why It Failed

Two compounding problems:

1. **Sparsity at boundaries**: Cross-session edges are only 4.2% of forward edges. The transition matrix has insufficient data to learn meaningful patterns at actual query boundaries.

2. **Recency is tautological**: Recency (6.0x lift) is the strongest viable baseline — but it's useless in practice because if you're querying memory, you already have the recent context. Returning the same clusters is circular.

### The Conclusion

Transition matrices do not provide useful signal at query boundaries. The approach works within sessions (where sequential chunks naturally revisit the same topics) but this is not where retrieval help is needed. At actual query boundaries — where retrieval would add value — the signal is too sparse to learn from.

This confirms the architectural separation: the graph's value is **structural ordering** (chain walking), not **predictive ranking** (transition matrices). Semantic discovery remains the job of vector search and BM25.

Script: `scripts/experiments/transition-boundary-experiment.ts`

## Takeaways

1. **Question assumptions**: Wall-clock time seems natural but is wrong
Expand All @@ -254,3 +297,4 @@ Each mechanism does what it's best at. The v0.2 architecture tried to make the g
6. **Measure before theorizing**: Sum-product traversal was theoretically elegant but contributed 2% of results
7. **Separate concerns**: The graph's value is structural ordering, not semantic ranking
8. **Simple beats complex**: 1-to-1 sequential edges outperform m×n all-pairs with sum-product traversal
9. **Distinguish within-session from cross-session signal**: A metric that looks great on the full graph may be entirely driven by trivial within-session patterns that don't help at retrieval time
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "causantic",
"version": "0.8.2",
"version": "0.9.0",
"description": "Long-term memory for Claude Code — local-first, graph-augmented, self-benchmarking",
"type": "module",
"private": false,
Expand Down
1 change: 1 addition & 0 deletions scripts/experiments/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ Research experiments and parameter sweeps.
| `sweep-min-weight.ts` | Minimum weight sweep | `npm run min-weight-sweep` |
| `sweep-depth.ts` | Depth sweep | `npm run depth-sweep` |
| `cross-project-experiment.ts` | Cross-project experiment | `npm run cross-project` |
| `transition-boundary-experiment.ts` | Transition matrix at query boundaries (rejected) | `npx tsx scripts/experiments/transition-boundary-experiment.ts` |
Loading