Give your AI assistant a persistent memory and the power to build knowledge graphs.
Archiledger is a specialized Knowledge Graph that serves as a RAG (Retrieval-Augmented Generation) system, equipped with a naive vector search implementation. It is exposed as a Model Context Protocol (MCP) server to enable LLM-based assistants to store, connect, and recall information using a graph database. Whether you need a personal memory bank that persists across conversations or want to analyze codebases and documents into structured knowledge graphs, Archiledger provides the infrastructure to make your AI truly remember.
⚠️ Disclaimer: This server currently implements no authentication mechanisms. Additionally, it relies on an embedded graph database (or in-memory storage) which is designed and optimized for local development and testing environments only. It is not recommended for production use in its current state.
LLMs are powerful, but they forget everything the moment a conversation ends. This creates frustrating experiences:
- Repeating yourself — Telling your assistant the same preferences, project context, or decisions over and over
- Lost insights — Valuable analysis from one session isn't available in the next
- No connected thinking — Information lives in silos without relationships between concepts
Archiledger solves this by giving your AI a graph-based memory:
| Problem | Archiledger Solution |
|---|---|
| Context resets every conversation | Persistent storage that survives restarts |
| Flat, disconnected notes | Graph structure with entities and relations |
| Manual note-taking | AI automatically stores and retrieves relevant info |
| Hard to explore large codebases | Build navigable knowledge graphs from code |
| Investigation dead ends | Follow relationships to discover connections |
| Keyword search limits | Vector search finds semantically similar concepts |
The graph model is particularly powerful because knowledge isn't flat — concepts relate to each other. When your AI can traverse these connections, it can provide richer context and discover non-obvious relationships.
- Knowledge Graph: Stores entities and relations.
- MCP Tools:
- Entity Management:
create_entities: Create new entities in the knowledge graph. Entities are nodes representing things like people, places, concepts, etc.get_entity: Get a specific entity by its name. Returns the entity with its type and observations.get_entities_by_type: Get all entities of a specific type (e.g., Person, Component, Service).delete_entities: Delete entities from the knowledge graph by their names.
- Relation Management:
create_relations: Create relations between entities. Relations are edges representing how entities are connected.get_relations_for_entity: Get all relations (incoming and outgoing) for a specific entity.get_relations_by_type: Get all relations of a specific type (e.g., DEPENDS_ON, USES, CONTAINS).delete_relations: Delete relations from the knowledge graph.
- Graph Exploration:
read_graph: Read the entire knowledge graph. Returns all entities and relations.get_related_entities: Find all entities directly connected to a given entity.get_entity_types: List all unique entity types in the graph.get_relation_types: List all unique relation types in the graph.similarity_search: Find entities based on semantic similarity using embeddings.
- Entity Management:
⚠️ Important: Application is designed for local development, personal use, and small-to-medium datasets. Review the following limitations before using in production-like scenarios.
| Limitation | Impact | Notes/Mitigation |
|---|---|---|
| Embedded Neo4j | Single-process database with limited concurrency | Suitable for small datasets (<100k nodes). Use external Neo4j cluster for production workloads. |
| Naive vector search | Linear O(n) similarity matching across all entities | No HNSW or specialized vector index. Performance degrades with dataset size. |
| Memory-bound embeddings | In-memory vector store consumes heap space | Consider external vector DB (Pinecone, Weaviate) for datasets >10k entities. |
| No authentication | All operations are unauthenticated | Intended for local/trusted environments only. |
| Heap-limited operations | Large graph reads (read_graph) may OOM |
Increase heap (-Xmx) or use pagination for large datasets. |
Based on load testing with 512MB heap:
| Operation | Throughput | Notes |
|---|---|---|
| Entity creation | ~50-100 ops/sec | Using Cypher inserts |
| Relation creation | ~30-60 ops/sec | Depends on graph connectivity |
| Entity lookup by ID | <10ms | Direct index lookup |
| Similarity search | O(n) | Scales linearly with entity count |
💡 Tip: For load testing see LOAD_TESTING.md.
- Domain Layer: Contains the core business logic and entities (
Entity,Relation). It defines the repository interface (KnowledgeGraphRepository). - Application Layer: Orchestrates the domain logic using services (
KnowledgeGraphService). - Infrastructure Layer:
- Persistence:
InMemoryKnowledgeGraphRepository: In-memory implementation (default).Neo4jKnowledgeGraphRepositoryAdapter: Neo4j implementation (activates withneo4jprofile).
- MCP: Acts as the primary adapter, exposing tools via the
McpToolAdapter.
- Persistence:
- Java 21 or higher
- Maven
mvn clean packageThe server uses streamable HTTP transport by default on port 8080.
java -jar mcp/target/archiledger-server-0.0.1-SNAPSHOT.jarThis mode runs a Neo4j server inside the application process.
Transient (Data lost on restart):
java -Dspring.profiles.active=neo4j -Dspring.neo4j.uri=embedded -jar mcp/target/archiledger-server-0.0.1-SNAPSHOT.jarPersistent (Data saved to file):
Set the memory.neo4j.data-dir property to a directory path.
java -Dspring.profiles.active=neo4j \
-Dspring.neo4j.uri=embedded \
-Dmemory.neo4j.data-dir=./neo4j-data \
-jar mcp/target/archiledger-server-0.0.1-SNAPSHOT.jar💡 Tip: Viewing the Graph with Neo4j Browser
When using embedded Neo4j, you can visualize your graph using Neo4j Browser. The embedded database exposes a Bolt endpoint on a dynamic port:
- Keep the Archiledger server running.
- Check the server logs for the Bolt URI, e.g.:
Driver instance ... created for server uri 'bolt://localhost:35157'- Open Neo4j Browser (default: http://localhost:8080) and connect using the Bolt URI from the logs.
- Run Cypher queries like
MATCH (n) RETURN nto explore your knowledge graph.
The Docker image supports configurable data persistence and Neo4j port configuration.
Transient (Data lost when container stops):
docker run -p 8080:8080 registry.hub.docker.com/thecookiezen/archiledger:latestPersistent (Data saved to host filesystem):
Mount a local directory to /data/neo4j inside the container:
docker run -p 8080:8080 -v /path/to/local/neo4j-data:/data/neo4j registry.hub.docker.com/thecookiezen/archiledger:latestWith Neo4j Bolt port exposed (for Neo4j Browser access): Expose the Bolt port to connect with external tools like Neo4j Browser:
docker run -p 8080:8080 -p 7687:7687 \
-v /path/to/local/neo4j-data:/data/neo4j \
registry.hub.docker.com/thecookiezen/archiledger:latestCustom Bolt port:
Override the default Bolt port (7687) using the NEO4J_BOLT_PORT environment variable:
docker run -p 8080:8080 -p 17687:17687 \
-e NEO4J_BOLT_PORT=17687 \
-v /path/to/local/neo4j-data:/data/neo4j \
registry.hub.docker.com/thecookiezen/archiledger:latestCustom data directory (Optional):
Override the default data directory path using the NEO4J_DATA_DIR environment variable:
docker run -p 8080:8080 -p 7687:7687 \
-e NEO4J_DATA_DIR=/custom/data/path \
-v /path/to/local/data:/custom/data/path \
registry.hub.docker.com/thecookiezen/archiledger:latest| Variable | Default | Description |
|---|---|---|
NEO4J_DATA_DIR |
/data/neo4j |
Directory where Neo4j stores its data |
NEO4J_BOLT_PORT |
7687 |
Port for Neo4j Bolt connections |
💡 Note: The data directory at
/data/neo4j(or your custom path) must be writable by the container user (UID 100,springuser). If you encounter permission errors, ensure your host directory has appropriate permissions:mkdir -p /path/to/local/neo4j-data chmod 777 /path/to/local/neo4j-data # or chown to UID 100
Configuration is located in src/main/resources/application.properties.
spring.ai.mcp.server.name=archiledger-server
spring.ai.mcp.server.version=1.0.0
spring.ai.mcp.server.protocol=STREAMABLE
server.port=8080Once the server is running, MCP clients can connect via:
- Streamable HTTP Endpoint:
http://localhost:8080/mcp
This MCP server can be used with LLM-based assistants (like GitHub Copilot, Gemini CLI, or other MCP-compatible clients) for various knowledge management scenarios. Below are two primary use cases with example instructions.
Use the knowledge graph as a persistent memory bank to store and recall information across conversations. The LLM can remember context, preferences, project notes, and important decisions.
# Memory Bank Instructions
You have access to a knowledge graph MCP server that serves as your persistent memory. Use it to store and retrieve important information across our conversations.
## Core Behaviors
### Proactive Memory Storage
When the user shares important information, store it automatically:
- **Preferences**: User's coding style, preferred tools, naming conventions
- **Decisions**: Architecture decisions, technology choices, rejected alternatives
- **Context**: Project goals, constraints, team information
- **Tasks**: Ongoing work, blockers, next steps
### Memory Structure
Use these entity types for organization:
- `preference` - User preferences and settings
- `decision` - Important decisions with rationale
- `context` - Project or domain context
- `task` - Work items and their status
- `note` - General notes and observations
- `person` - Team members and stakeholders
### Creating Memories
When storing information:
1. Create an entity with a descriptive name
2. Set the appropriate entityType
3. Add detailed observations (store reasoning, not just facts)
### Recalling Memories
At the start of each conversation:
1. Use `read_graph` to get an overview of stored knowledge
2. Use `similarity_search` to find relevant context for the current task
3. Reference stored decisions and preferences in your responses
### Creating Relations
Link related memories for better context.
#### Relation Types
- `RELATES_TO` - General relationship
- `DEPENDS_ON` - Dependency relationship
- `AFFECTS` - One thing impacts another
- `PART_OF` - Component/container relationship
- `SUPERSEDES` - Replaces previous decision/approachUse the knowledge graph to build a structured representation of a codebase or document corpus. This is valuable for onboarding, architecture documentation, investigation, and understanding complex systems.
# Codebase Knowledge Graph Builder
You have access to a knowledge graph MCP server. Use it to create a structured knowledge base of the codebase for architecture documentation, onboarding, and investigation.
## Analysis Workflow
### Phase 1: High-Level Structure
Start by mapping the overall architecture:
1. Identify major modules, packages, or services
2. Create entities for each architectural component
3. Map dependencies between components
### Phase 2: Deep Dive
For each component, analyze and document:
1. Key classes, interfaces, and their responsibilities
2. Important functions and their purposes
3. Data models and their relationships
4. External integrations and APIs
### Phase 3: Cross-Cutting Concerns
Document patterns that span multiple components:
1. Design patterns in use
2. Shared utilities and helpers
3. Configuration and environment handling
4. Error handling strategies
## Entity Types for Code Analysis
Use these entity types:
- `module` - Top-level packages, services, or bounded contexts
- `component` - Major classes, interfaces, or subsystems
- `function` - Important functions or methods
- `model` - Data models, DTOs, entities
- `pattern` - Design patterns in use
- `config` - Configuration classes or files
- `api` - External or internal API endpoints
- `dependency` - External libraries or services
## Creating Code Entities
When analyzing code, create detailed entities.
## Relation Types for Code
Use these relation types:
- `DEPENDS_ON` - Class/module depends on another
- `IMPLEMENTS` - Implements an interface or contract
- `EXTENDS` - Inherits from another class
- `USES` - Utilizes another component
- `CALLS` - Function calls another function
- `CONTAINS` - Package contains class, class contains method
- `PRODUCES` - Creates or emits events/messages
- `CONSUMES` - Handles events/messages
## Querying for Investigation
Use the graph for code investigation:
1. **Find dependencies**: Search for a component and examine its relations
2. **Impact analysis**: Follow `DEPENDS_ON` relations to find affected components
3. **Understand data flow**: Trace `CALLS`, `PRODUCES`, `CONSUMES` relations
4. **Onboarding**: Start with `module` entities, then drill into `component` entities
## Best Practices
1. **Be consistent** with naming (use class names, not descriptions)
2. **Include file paths** in observations for easy navigation
3. **Document "why"** not just "what" - capture design rationale
4. **Update incrementally** - add to the graph as you explore
5. **Link generously** - relations are what make the graph valuableConfigure your LLM client to connect to the Archiledger MCP server. Below are examples for common clients.
{
"mcpServers": {
"archiledger": {
"httpUrl": "http://localhost:8080/mcp"
}
}
}{
"servers": {
"archiledger": {
"type": "http",
"url": "http://localhost:8080/mcp"
}
}
}{
"mcpServers": {
"archiledger": {
"serverUrl": "http://localhost:8080/mcp"
}
}
}-
Persistent Data: Always mount a volume (
-v) to preserve your knowledge graph across container restarts. -
Container Lifecycle: Run the container separately with
-d(detached mode). -
Port Conflicts: If port 8080 is in use, map to a different host port (e.g.,
-p 9090:8080) and update the URL accordingly. -
Named Containers: Use
--name archiledgerto easily manage the container:docker stop archiledger && docker rm archiledger -
Check Container Logs: Debug connection issues with:
docker logs archiledger