Automatically generate ArchiMate enterprise architecture models from software repositories.
Deriva analyzes code repositories and transforms them into ArchiMate models that can be opened in the Archi modeling tool.
- Clone a Git repository
- Extract a graph representation into Neo4j:
- Structural nodes: directories, files (classified by type and subtype)
- Semantic nodes: TypeDefinitions, Methods, BusinessConcepts, Technologies, etc.
- Python files use fast AST extraction; other languages use LLM
- Derive ArchiMate elements using a hybrid approach:
- Enrich phase: Graph enrichment (PageRank, Louvain communities, k-core)
- Generate phase: LLM-based element derivation with graph metrics
- Refine phase: Relationship derivation and quality assurance
- Export to
.archimateXML file
- Python 3.14+
- Docker (for Neo4j)
- uv (Python package manager)
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | shgit clone https://github.com/StevenBtw/Deriva.git
cd Deriva
# Create environment configuration
cp .env.example .env
# Edit .env with your settings (Neo4j, LLM API keys, etc.)uv venv --python 3.14Activate the virtual environment:
# Windows PowerShell
.venv\Scripts\Activate.ps1
# Windows Command Prompt
.venv\Scripts\activate.bat
# macOS/Linux
source .venv/bin/activateuv synccd deriva/adapters/neo4j
docker-compose up -dNeo4j will be available at:
- Browser UI: http://localhost:7474 (no authentication)
- Bolt Protocol:
bolt://localhost:7687
Verify Neo4j is running:
docker ps # Should show deriva_neo4j containercd ../../.. # Back to Deriva root
uv run marimo edit deriva/app/app.pyThe marimo notebook opens in your browser at: http://127.0.0.1:2718
When you first open Deriva, you need to seed the configuration database.
Navigate to Column 2: Manage Extraction → File Type Registry
- Click "Seed from JSON"
- This loads default file type mappings from
extraction_config.json - Categories include: Source, Config, Docs, Test, Build, Asset, Data, Exclude
Navigate to Column 2: Manage Extraction → Extraction Step Configuration
Enable the extraction steps you need:
| Step | Purpose | Recommended |
|---|---|---|
| Repository | Creates root node for the repo | Always |
| Directory | Creates directory structure nodes | Always |
| File | Creates file nodes with classification | Always |
| TypeDefinition | Extracts classes, functions (AST for Python) | Yes |
| Method | Extracts methods from type definitions | Optional |
| Technology | Detects frameworks and libraries | Optional |
| ExternalDependency | Maps external dependencies | Optional |
| Test | Extracts test definitions | Optional |
If using LLM-assisted extraction, configure your provider in .env:
LLM_PROVIDER=mistral # or azure, anthropic
LLM_MISTRAL_API_KEY=your-key-here
LLM_MISTRAL_MODEL=devstral2
LLM_MISTRAL_URL=https://api.mistral.ai/v1/chat/completions
LLM_MISTRAL_STRUCTURED_OUTPUT=true
Column 1: Configuration → Repositories
- Enter repository URL (e.g.,
https://github.com/user/repo.git) - Optionally specify a target name
- Click "Clone"
Column 0: Run Deriva
- Click "Run Deriva" to run the full pipeline (extraction → derivation)
- Or use individual step buttons: Extraction, Derivation
Results display in a status callout showing nodes/elements created and any errors.
Column 1: Configuration
- Graph Statistics: Node counts by type (Repository, Directory, File, etc.)
- ArchiMate Model: Element and relationship counts by type
Column 1: Configuration → ArchiMate Model
- Set export path (default:
workspace/output/model.archimate) - Click "Export Model"
- Open the file with Archi
Via CLI:
deriva export -o workspace/output/model.archimateAll configuration lives in .env. Key settings:
# Neo4j (default docker-compose has auth disabled)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=
NEO4J_PASSWORD=
# LLM Provider (mistral, openai, azure, anthropic, ollama, lmstudio)
LLM_MISTRAL_DEVSTRAL_PROVIDER=mistral
LLM_MISTRAL_DEVSTRAL_MODEL=devstral-2512
LLM_MISTRAL_DEVSTRAL_URL=https://api.mistral.ai/v1/chat/completions
LLM_MISTRAL_DEVSTRAL_KEY=your-mistral-api-key
LLM_MISTRAL_DEVSTRAL_STRUCTURED_OUTPUT=true
# Namespaces
NEO4J_GRAPH_NAMESPACE=Graph
ARCHIMATE_NAMESPACE=ModelSee .env.example for all available options.
The LLM adapter includes built-in rate limiting to prevent API throttling:
# Requests per minute (0 = use provider default: 60 RPM for cloud, unlimited for local)
LLM_RATE_LIMIT_RPM=0
# Minimum delay between requests in seconds
LLM_RATE_LIMIT_DELAY=0.0
# Max retries on rate limit (429) errors
LLM_RATE_LIMIT_RETRIES=3Default rate limits by provider:
| Provider | Default RPM |
|---|---|
| OpenAI | 30 |
| Anthropic | 30 |
| Mistral | 24 |
| Ollama | Unlimited |
| LM Studio | Unlimited |
The rate limiter automatically:
- Throttles requests to stay within limits
- Applies exponential backoff on rate limit errors (HTTP 429)
- Handles timeout errors with backoff retries
If you encounter undefined extensions during extraction:
Via UI (Marimo):
- Navigate to Column 2 → Undefined Extensions
- Add them to the registry:
- Extension (e.g.,
.tsx,Dockerfile) - Type (source, config, docs, test, build, asset, data, exclude)
- Subtype (e.g.,
typescript,docker)
- Extension (e.g.,
Via CLI:
# List all registered file types
deriva config filetype list
# Add a new file type
deriva config filetype add ".tsx" source typescript
# Delete a file type
deriva config filetype delete ".tsx"
# Show file type statistics by category
deriva config filetype statsNote: Files with unrecognized extensions are automatically classified as
file_type="unknown"with their extension as the subtype. This ensures all files get proper classification even without explicit registry entries.
Deriva uses a versioning system for configurations. When you update a config, a new version is created while preserving previous versions for rollback.
Correct ways to update configs:
- Via UI (Marimo): Navigate to the config section, edit, and click "Save Config"
- Via CLI: Use the
config updatecommand
# Update extraction config instruction
deriva config update extraction BusinessConcept \
-i "New instruction text..."
# Update derivation config from file
deriva config update derivation ApplicationComponent \
--instruction-file prompts/app_component.txt
# View all versions
deriva config versionsDo NOT use JSON import/export for config updates. The db_tool import command is only for backup restoration or migration - it overwrites version history. See BENCHMARKS.md for the optimization workflow.
For LLM-assisted extraction steps:
- Navigate to Column 2 → Extraction Step Configuration
- Expand a node type (e.g., TypeDefinition)
- Edit: Input File Types, Input Graph Elements, Instruction, Example
- Click "Save Config" (this creates a new version)
All prompts follow the Input + Instruction + Example pattern.
Deriva uses a multi-column marimo notebook layout:
| Column | Purpose |
|---|---|
| 0 | Run Deriva: Pipeline execution buttons, status display |
| 1 | Configuration: Runs, repositories, Neo4j, graph stats, ArchiMate, LLM |
| 2 | Extraction Settings: File type registry, extraction step configuration |
| 3 | Derivation Settings: Element type configuration (13 types across Business/Application/Technology layers), relationship derivation |
The UI is powered by PipelineSession from the services layer, providing a clean separation between presentation and business logic.
- Neo4j Graph Database:
- Graph namespace: Intermediate representation (Modules, Files, Dependencies)
- Model namespace: ArchiMate elements and relationships
- DuckDB (
deriva/adapters/database/sql.db): File type registry, extraction configs, settings
Column 0: Run Overview
- Clear Graph: Removes all nodes/edges from Graph namespace
- Clear Model: Removes all ArchiMate elements and relationships
Access the Neo4j browser at http://localhost:7474 and run Cypher queries:
// See all repositories
MATCH (r:Graph:Repository) RETURN r
// See files in a repo
MATCH (repo:Graph:Repository)-[:Graph:CONTAINS*]->(f:Graph:File)
WHERE repo.name = 'my-repo'
RETURN f.name, f.file_type
// See type definitions
MATCH (td:Graph:TypeDefinition) RETURN td.name, td.type_categoryDeriva includes a full CLI for headless operation and automation:
# Help
deriva --help
# View configuration
deriva config list extraction
deriva config show extraction BusinessConcept
deriva status
# Manage file types
deriva config filetype list
deriva config filetype add ".lock" dependency lock
deriva config filetype stats
# Run pipeline stages
deriva run extraction --repo flask_invoice_generator -v
deriva run derivation -v
deriva run derivation --phase generate -v # Run specific phase (enrich, generate, refine)
deriva run all --repo myrepo
# Export ArchiMate model
deriva export -o workspace/output/model.archimateCLI Options:
| Option | Description |
|---|---|
--repo NAME |
Process specific repository (default: all) |
--phase PHASE |
Run specific derivation phase: enrich, generate, or refine |
-v, --verbose |
Print detailed progress |
--no-llm |
Skip LLM-based steps (structural extraction only) |
-o, --output PATH |
Output file path for export |
Deriva includes a multi-model benchmarking system for comparing LLM performance across different providers and models. See BENCHMARKS.md for the full guide and optimization_guide.md for detailed case studies.
# List available benchmark models
deriva benchmark models
# Run a benchmark with specific models
deriva benchmark run \
--repos flask_invoice_generator \
--models openai-gptx,ollama-devstral \
-n 3 \
-d "Comparing gptx with devstral" \
-v
# List benchmark sessions
deriva benchmark list
# Analyze a benchmark session
deriva benchmark analyze bench_20260101_150724Add models to .env using the pattern:
# Azure GPT-4o-mini
LLM_AZURE_GPT4MINI_PROVIDER=azure
LLM_AZURE_GPT4MINI_MODEL=gpt-4
LLM_AZURE_GPT4MINI_URL=https://your-resource.openai.azure.com/...
LLM_AZURE_GPT4MINI_KEY=your-api-key
# Ollama local model
LLM_OLLAMA_LLAMA_PROVIDER=ollama
LLM_OLLAMA_LLAMA_MODEL=devstral
LLM_OLLAMA_LLAMA_URL=http://localhost:11434/api/chatBenchmark runs are logged in OCEL 2.0 (Object-Centric Event Log) format for process mining analysis:
- Events capture pipeline stages, LLM calls, and results
- Object types:
BenchmarkSession,BenchmarkRun,Repository,Model - Logs are saved to
workspace/benchmarks/{session_id}/events.ocel.json
OCEL files can be analyzed with process mining tools like PM4Py, Celonis, or custom analysis scripts.
# Check if running
docker ps
# View logs
cd deriva/adapters/neo4j && docker-compose logs
# Restart
docker-compose restart
# Clear all data (destructive!)
docker-compose down -vIf ports 7687/7474 are in use, edit deriva/adapters/neo4j/docker-compose.yml:
ports:
- "7688:7687"
- "7475:7474"Update .env accordingly:
NEO4J_URI=bolt://localhost:7688# Check Python version
python --version # Should be 3.14+
# Reinstall dependencies
uv sync --reinstall
# Run without watch mode
uv run marimo edit deriva/app/app.pyFor development setup, architecture details, and contribution guidelines, see CONTRIBUTING.md.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
This means you can freely use, modify, and distribute this software, but if you run a modified version as a network service, you must make the source code available to users of that service.
See LICENSE for the full license text.
- Marimo - Reactive Python notebooks
- Neo4j - Graph database
- ArchiMate - Enterprise architecture standard
- Archi - Open source ArchiMate modeling tool
- Tree-sitter - Multi-language AST parsing
Status: Active Development