GitHub - agentralabs/agentic-vision: Persistent visual memory for AI agents — capture screenshots, embed with CLIP ViT-B/32, compare, recall. MCP server + Rust core library.

Quickstart · Problems Solved · Why · Benchmarks · How It Works · Install · API · Papers

AI agents can't see across sessions.

Your agent takes a screenshot, analyzes it, and forgets. Next session — blank slate. It can't compare what a page looks like now versus yesterday. It can't recall what the error dialog said three conversations ago. It can't search its own visual history.

Text-based memory exists. Visual memory doesn't — until now.

AgenticVision gives AI agents persistent visual memory. Capture images, embed them with CLIP ViT-B/32, store them in a compact binary format, and query them by similarity, time, or description. Every capture is a first-class MCP resource that any LLM can access.

Problems Solved (Read This First)

Problem: agents cannot remember what they saw last session.
Solved: .avis keeps persistent visual history across sessions and model changes.
Problem: visual regressions are noticed late or missed.
Solved: built-in compare and diff workflows surface change quickly.
Problem: screenshots pile up with no searchable structure.
Solved: each capture is embedded, timestamped, and queryable by similarity and metadata.
Problem: image context stays trapped in one tool.
Solved: MCP tools/resources expose visual memory to any compatible client.
Problem: what an agent sees is disconnected from what it remembers.
Solved: memory linking connects visual captures directly to cognitive graph nodes.

cargo install agentic-vision-cli agentic-vision-mcp

CLI + MCP binaries. 21 MCP tools. Persistent .avis files. Works with Claude Desktop, VS Code, Cursor, Windsurf, and any MCP-compatible client.

Benchmarks

Rust core. CLIP ViT-B/32 via ONNX Runtime. Binary .avis format. Real numbers from cargo test --release:

Operation	Time	Notes
Image capture (file → embed → store)	47 ms	CLIP ViT-B/32, 512-dim
Similarity search (top-5)	1-2 ms	Brute-force cosine, f64 precision
Visual diff (pixel-level)	<1 ms	8×8 grid region detection
MCP tool round-trip	7.2 ms	Including process startup (~6.1 ms)
Storage per capture	~4.26 KB	Embedding + JPEG thumbnail
Capacity per GB	~250K	Observations

All benchmarks on Apple M4, macOS 26.2, Rust 1.90.0 --release. ONNX Runtime for CLIP inference. Fallback mode available when ONNX model is not present.

Why AgenticVision

Agents need visual continuity. A debugging agent should remember what the UI looked like before and after a code change. A monitoring agent should detect visual regressions. A research agent should build a visual knowledge base over time.

Capture once, query forever. Every image is embedded into a 512-dimensional CLIP vector and stored with its JPEG thumbnail, timestamp, and description. Query by cosine similarity, time range, or text search — in milliseconds.

Binary format, not a database. The .avis file is a single portable binary — 64-byte header, JSON payload, JPEG thumbnails. Copy it, share it, back it up. No server, no database, no dependencies.

Works with every MCP client. AgenticVision-MCP exposes 21 tools, 6 resources, and 4 prompts via the Model Context Protocol. Any LLM that speaks MCP gains visual memory automatically.

Links to AgenticMemory. The vision_link tool connects visual captures to AgenticMemory cognitive graph nodes — bridging what an agent sees with what it knows.

Ghost Writer

New in v0.2.4 -- Auto-syncs visual context to your AI coding tools every 5 seconds.

Client	Config Location	Status
Claude Code	`~/.claude/memory/VISION_CONTEXT.md`	Full support
Cursor	`~/.cursor/memory/agentic-vision.md`	Full support
Windsurf	`~/.windsurf/memory/agentic-vision.md`	Full support
Cody	`~/.sourcegraph/cody/memory/agentic-vision.md`	Full support

Syncs: recent captures, observations, visual tool calls. Zero configuration. Context survives sessions automatically.

MCP Hardening

New in v0.2.5 -- Production-grade stdio transport.

Content-Length framing with 8 MiB limit
JSON-RPC 2.0 validation
Atomic writes (temp + rename + fsync)
No silent fallbacks

How It Works

Capture — vision_capture accepts images from files, base64, screenshots, or the system clipboard. Each image is resized, embedded via CLIP ViT-B/32 into a 512-dimensional vector, compressed to JPEG thumbnail, and stored in the .avis binary file. Screenshots support optional region capture; clipboard reads the current image from the OS clipboard.
Query — vision_query retrieves captures by time range, description, recency, and quality constraints (min_quality, sort_by). Results include capture metadata, quality scores, thumbnails, and similarity scores.
Compare — vision_compare places two captures side-by-side for LLM analysis. vision_diff performs pixel-level differencing with 8×8 grid region detection to identify exactly what changed.
Link — vision_link connects captures to AgenticMemory nodes, bridging visual observations with the agent's cognitive graph. An agent can recall "what did the UI look like when I made that decision?"

The .avis binary format uses a 64-byte fixed header (magic 0x41564953, version, counts, timestamps) followed by a JSON payload containing captures with embedded JPEG thumbnails and 512-dim float vectors. Single-file, portable, no external dependencies.

MCP surface area

21 Tools (core 11 + grounding 3 + workspace 5 + observation 1 + session 1):

Tool	Description
`vision_capture`	Capture and embed an image (file, base64, screenshot, clipboard), with metadata redaction and quality scoring
`vision_compare`	Side-by-side comparison of two captures
`vision_query`	Query captures by time, description, recency
`vision_ocr`	Extract text from a captured image
`vision_similar`	Find visually similar captures (cosine similarity)
`vision_track`	Track visual changes to a target over time
`vision_diff`	Pixel-level diff between two captures
`vision_health`	Quality + staleness + memory-link coverage summary
`vision_link`	Link a capture to an AgenticMemory node
`session_start`	Begin a named observation session
`session_end`	End the current session

6 Resources:

URI	Description
`avis://capture/{id}`	Single capture with metadata and thumbnail
`avis://session/{id}`	All captures in a session
`avis://timeline/{start}/{end}`	Captures within a time range
`avis://similar/{id}`	Visually similar captures
`avis://stats`	Storage statistics and counts
`avis://recent`	Most recent captures

4 Prompts:

Prompt	Description
`observe`	Guided visual observation workflow
`compare`	Structured comparison between captures
`track`	Change tracking over time
`describe`	Detailed image description

Install

One-liner (desktop profile, backwards-compatible):

curl -fsSL https://agentralabs.tech/install/vision | bash

Environment profiles (one command per environment):

# Desktop MCP clients (auto-merge Claude Desktop + Claude Code when detected)
curl -fsSL https://agentralabs.tech/install/vision/desktop | bash

# Terminal-only (no desktop config writes)
curl -fsSL https://agentralabs.tech/install/vision/terminal | bash

# Remote/server hosts (no desktop config writes)
curl -fsSL https://agentralabs.tech/install/vision/server | bash

Channel	Command	Result
GitHub installer (official)	`curl -fsSL https://agentralabs.tech/install/vision \| bash`	Installs release binaries when available, otherwise source fallback; merges MCP config
GitHub installer (desktop profile)	`curl -fsSL https://agentralabs.tech/install/vision/desktop \| bash`	Explicit desktop profile behavior
GitHub installer (terminal profile)	`curl -fsSL https://agentralabs.tech/install/vision/terminal \| bash`	Installs binaries only; no desktop config writes
GitHub installer (server profile)	`curl -fsSL https://agentralabs.tech/install/vision/server \| bash`	Installs binaries only; server-safe behavior
crates.io + Cargo deps (official)	`cargo install agentic-vision-cli agentic-vision-mcp` + `cargo add agentic-vision`	Installs `avis`, MCP server binary, and adds the core library crate to your project
npm (wasm)	`npm install @agenticamem/vision`	WASM-based vision SDK for Node.js and browser

Server auth and artifact sync

For cloud/server runtime:

export AGENTIC_TOKEN="$(openssl rand -hex 32)"

All MCP clients must send Authorization: Bearer <same-token>. If .avis/.amem/.acb files are on another machine, sync them to the server first.

CLI + MCP Server (for Claude Desktop, VS Code, Cursor, Windsurf):

cargo install agentic-vision-cli agentic-vision-mcp

Core library (for Rust projects):

cargo add agentic-vision

Configure Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "agentic-vision": {
      "command": "agentic-vision-mcp",
      "args": ["--vision", "~/.vision.avis", "serve"]
    }
  }
}

See INSTALL.md for full installation guide, VS Code / Cursor configuration, build from source, and troubleshooting.

Do not use /tmp for vision files — macOS and Linux clear this directory periodically. Use ~/.vision.avis for persistent storage.

Deployment Model

Standalone by default: AgenticVision is independently installable and operable. Integration with AgenticMemory or AgenticCodebase is optional, never required.
Autonomic operations by default: daemon/runtime maintenance uses safe profile-based defaults with cache hygiene, migration safeguards, and health-ledger snapshots.

Area	Default behavior	Controls
Autonomic profile	Conservative local-first posture	`CORTEX_AUTONOMIC_PROFILE=desktop
Cache + registry maintenance	Periodic expiry cleanup and registry GC	`CORTEX_MAINTENANCE_TICK_SECS`, `CORTEX_REGISTRY_GC_EVERY_TICKS`, `CORTEX_REGISTRY_GC_KEEP_DELTAS`
Storage migration	Policy-gated with checkpointed auto-safe path	`CORTEX_STORAGE_MIGRATION_POLICY=auto-safe
Storage budget policy	20-year projection + capture rollup under pressure	`CORTEX_STORAGE_BUDGET_MODE=auto-rollup
Maintenance throttling	SLA-aware under sustained cache pressure	`CORTEX_SLA_MAX_CACHE_ENTRIES_BEFORE_GC_THROTTLE`
Health ledger	Periodic operational snapshots (default: `~/.agentra/health-ledger`)	`CORTEX_HEALTH_LEDGER_DIR`, `AGENTRA_HEALTH_LEDGER_DIR`, `CORTEX_HEALTH_LEDGER_EMIT_SECS`

Quickstart

MCP (Claude Desktop, VS Code, Cursor)

After configuring the MCP server (see Install), ask your agent:

"Take a screenshot and remember it."

The LLM calls vision_capture automatically. Then later:

"What did the screen look like earlier?"

The LLM calls vision_query to retrieve and display past captures.

Rust API

use agentic_vision::{VisionStore, CaptureSource};

let mut store = VisionStore::open("observations.avis")?;

// Capture from file
let id = store.capture(
    CaptureSource::File("screenshot.png"),
    "Homepage after deploy"
)?;

// Find similar
let matches = store.similar(id, 5)?;
for m in matches {
    println!("  {} (similarity: {:.3})", m.description, m.score);
}

Common Workflows

Track UI regression -- After a deploy, capture before/after screenshots and compare:

vision_capture  (before deploy screenshot, label: "pre-deploy")
vision_capture  (after deploy screenshot,  label: "post-deploy")
vision_diff     id_a=<before_id> id_b=<after_id>    # Pixel-level region diff

Build visual evidence trail -- During debugging, attach screenshots to memory nodes:

vision_capture  source=screenshot, labels=["bug-123", "dialog-state"]
vision_link     capture_id=<id> memory_node_id=<node> relationship="evidence_for"

Find similar UI states -- When diagnosing a recurring visual bug:

vision_similar  capture_id=<current_issue_id> top_k=5 min_similarity=0.8

Audit capture quality -- Periodic maintenance to clean up stale or low-quality captures:
```
vision_health   stale_after_hours=168 low_quality_threshold=0.45
```

Validation

Suite	Tests	Notes
Rust core (`agentic-vision`)	38	Unit + integration (includes screenshot/clipboard)
Python SDK tests	47	Edge cases, format validation
MCP integration suite	3	Python → Rust stdio transport
Multi-agent suite	3	Shared file, vision-memory linking, rapid handoff
Total	91	All passing

Two research papers:

Repository Structure

This is a Cargo workspace monorepo containing the core library, CLI, MCP server, and FFI bindings.

agentic-vision/
├── Cargo.toml                    # Workspace root
├── crates/
│   ├── agentic-vision/           # Core library (crates.io: agentic-vision v0.2.2)
│   ├── agentic-vision-cli/       # CLI (crates.io: agentic-vision-cli v0.2.2)
│   ├── agentic-vision-mcp/       # MCP server (crates.io: agentic-vision-mcp v0.2.2)
│   └── agentic-vision-ffi/       # FFI bindings (crates.io: agentic-vision-ffi v0.2.2)
├── tests/                        # Integration tests (Python → Rust, multi-agent)
├── models/                       # ONNX model directory (CLIP ViT-B/32)
├── publication/                  # Research papers (I, II)
├── assets/                       # SVG diagrams and visuals
└── docs/                         # Guides and reference

Running Tests

# All workspace tests (unit + integration)
cargo test --workspace

# Core library only
cargo test -p agentic-vision

# MCP server only
cargo test -p agentic-vision-mcp

# Python integration tests
python tests/integration/test_mcp_clients.py
python tests/integration/test_multi_agent.py

MCP Server Quick Start

cargo install agentic-vision-cli agentic-vision-mcp

Configure Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "agentic-vision": {
      "command": "agentic-vision-mcp",
      "args": ["--vision", "~/.vision.avis", "serve"]
    }
  }
}

Configure VS Code / Cursor (.vscode/settings.json):

{
  "mcp.servers": {
    "agentic-vision": {
      "command": "agentic-vision-mcp",
      "args": ["--vision", "~/.vision.avis", "serve"]
    }
  }
}

agentic-vision-mcp supports both line-delimited JSON-RPC and Content-Length framed MCP stdio messages.

Roadmap: Next — Remote Server Support

The next release is planned to add HTTP/SSE transport for remote deployments. Track progress in #2.

Feature	Status
`--token` bearer auth	Planned
`--multi-tenant` per-user vision files	Planned
`/health` endpoint	Planned
`--tls-cert` / `--tls-key` native HTTPS	Planned
OCR with Tesseract (`--features ocr`)	Planned
Clipboard TIFF fix	Planned
`delete` / `export` / `compact` CLI commands	Planned
Docker image + compose	Planned
Remote deployment docs	Planned

Planned CLI shape (not available in current release):

agentic-vision-mcp serve-http --port 8081 --token "<token>"
agentic-vision-mcp serve-http --multi-tenant --data-dir /data/users --port 8081 --token "<token>"

The .avis File

Your agent's visual memory. Everything it's seen.


Size	~5-8 GB over 20 years
Format	Binary captures with embeddings
Works with	Any vision-capable model

v0.2: Grounding & Workspaces

Grounding: Agent cannot claim "page shows X" without capture evidence.

Workspaces: Compare across sites and time periods.

Contributing

See CONTRIBUTING.md. The fastest ways to help:

Try it and file issues
Add an MCP tool — extend the visual memory surface
Write an example — show a real use case
Improve docs — every clarification helps someone

Privacy and Security

All captures stay local in .avis files -- no telemetry, no cloud sync by default.
Metadata scrubbing removes EXIF and location data from captured images before storage.
Storage budget policy prevents unbounded disk growth with 20-year projection and capture rollup.
Server mode requires an explicit AGENTIC_TOKEN environment variable for bearer auth.
Quality scoring helps identify and prune low-value captures to keep the store lean.

_{Built by Agentra Labs}

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github		.github
assets		assets
benchmarks		benchmarks
clients		clients
crates		crates
docker		docker
docs		docs
examples		examples
extractors		extractors
integrations		integrations
models		models
npm/wasm		npm/wasm
packaging		packaging
paper		paper
python		python
runtime		runtime
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
GUIDE.md		GUIDE.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI agents can't see across sessions.

Problems Solved (Read This First)

Benchmarks

Why AgenticVision

Ghost Writer

MCP Hardening

How It Works

Install

Server auth and artifact sync

Deployment Model

Quickstart

MCP (Claude Desktop, VS Code, Cursor)

Rust API

Common Workflows

Validation

Repository Structure

Running Tests

MCP Server Quick Start

Roadmap: Next — Remote Server Support

The .avis File

v0.2: Grounding & Workspaces

Contributing

Privacy and Security

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 1

Languages

Uh oh!

License

agentralabs/agentic-vision

Folders and files

Latest commit

History

Repository files navigation

AI agents can't see across sessions.

Problems Solved (Read This First)

Benchmarks

Why AgenticVision

Ghost Writer

MCP Hardening

How It Works

Install

Server auth and artifact sync

Deployment Model

Quickstart

MCP (Claude Desktop, VS Code, Cursor)

Rust API

Common Workflows

Validation

Repository Structure

Running Tests

MCP Server Quick Start

Roadmap: Next — Remote Server Support

The .avis File

v0.2: Grounding & Workspaces

Contributing

Privacy and Security

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 1

Languages

Packages