Skip to content

SPECTRA - Scalable Platform for Enterprise Content Topology & Resource Analytics

License

Apache-2.0 and 2 other licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

saworbit/spectra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

37 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Spectra

Scalable Platform for Enterprise Content Topology & Resource Analytics

Rust License Status Rust CI Frontend CI


πŸ“‘ The Mission

Spectra is the intelligence layer of the Orbit/SPACE Ecosystem.

While Orbit handles the movement of data and SPACE handles the environment, Spectra provides the vision. It is a high-performance data cartography agent designed to map the "Dark Matter" of enterprise storage.

It starts as a hyper-fast storage profiler (like WizTree/ncdu) but is architected to evolve into a distributed semantic governance brain for enterprise data management.

⚑ Capabilities

Phase 1: Core Engine

  • Zero-Latency Scanning: Bypasses standard OS bottlenecks using parallelized walkers (jwalk).
  • Data Gravity Mapping: Instantly identifies "Heavy Hitters" (top largest files) and "Data Sprawl" (extension distribution).
  • Edge-First Architecture: Runs locally with a tiny memory footprint (<20MB RAM for 1M files).
  • Dual Mode:
    • Human Mode: Pretty-printed CLI summaries for sysadmins.
    • Agent Mode: JSON streams for ingestion into the Spectra Brain.

Phase 2: The Semantic Bridge βœ… (Implemented)

  • Intelligent Analysis: Move beyond size to understand meaning and risk.
  • Entropy Profiling: Shannon entropy calculation to detect encrypted/compressed files.
  • Risk Scoring: 5-level risk classification for sensitive files (passwords, keys, credentials).
  • Content Classification: Optional AI-powered categorization using rust-bert (legal, financial, technical).
  • Tiered Architecture: Fast heuristics (microseconds) with optional deep analysis (milliseconds).
  • Feature Flags: Keep base binary small; opt-in to heavy AI models.

Phase 3: The Enterprise Mesh βœ… (Implemented)

  • Federation: Hub & Spoke architecture for distributed agent coordination.
  • Spectra Server: Central control plane (Axum + SurrealDB) for telemetry aggregation.
  • ⏳ Time-Travel Analytics (Phase 3.5): πŸ†• NEW!
    • Historical snapshot storage with time-series database
    • Velocity calculation: "How fast is data growing?"
    • Extension attribution: "What caused the spike last Tuesday?"
    • Interactive timeline scrubber in GUI
    • Bytes/second growth metrics with delta analysis
    • Zero file content storage (metadata-only, privacy-preserving)
  • Active Governance: Policy-based file management (Report, Delete, Archive actions).
  • Safety-First: Dry-run mode by default; requires explicit --enforce flag for destructive actions.
  • Beacon Protocol: Agents push snapshots and pull policies via REST API.
  • Local-First: Agents work perfectly offline; federation is optional.

Phase 4: The Lens (Visualization Layer) βœ… (Implemented)

  • Enterprise Dashboard: Real-time statistics visualization for directory scans.
  • Dual-Mode Interface: πŸ†• Tab system for Local Scan + Time-Travel Analytics
  • Data Insights:
    • πŸ“Š Overview Card: Total files, folders, size, and scan duration
    • πŸ“ˆ Top Extensions: Top 5 file types by total size with file counts
    • 🐳 Heavy Hitters: Top 10 largest files with full path display
    • ⏳ Time-Travel Tab: Interactive timeline slider, velocity metrics, and growth attribution
  • Desktop Application: Native cross-platform GUI built with Tauri v2 + React + TypeScript.
  • Tauri Bridge: Strongly-typed interface between React frontend and Rust backend.
  • Dark-Themed Interface: Modern enterprise UI with grid-based card layout.
  • One-Click Launch: Sophisticated scripts for instant startup on Windows and Unix systems.
  • Type Safety: Full TypeScript support with interface contracts for scan results.

πŸš€ Quick Start

Installation

git clone https://github.com/YOUR_USERNAME/spectra.git
cd spectra

Running the GUI Application (Phase 4 - The Lens)

Quick Launch (Recommended):

# Windows
launch-vision.bat

# Unix/Linux/macOS
./launch-vision.sh

Manual Launch:

cd app
npm install
npm run tauri dev

The application will open in a new window with the enterprise dashboard. Enter a directory path and click "Deep Scan" to visualize file statistics. See app/README.md for detailed GUI documentation.

Running the CLI Agent (Headless)

# Build the CLI agent
cargo build --release -p spectra-cli

# Basic scan (Phase 1 - topology only)
./target/release/spectra-cli --path ./

# Scan with semantic analysis (Phase 2 - entropy + risk detection)
./target/release/spectra-cli --path ./ --analyze

# Scan with full AI classification (requires semantic feature)
cargo build --release -p spectra-cli --features semantic
./target/release/spectra-cli --path ./ --semantic

# Output JSON for analysis (Agent Mode)
./target/release/spectra-cli --path ./ --analyze --json > scan_results.json

Running the Server (Phase 3 - Federation)

# Start the Spectra Server (Hub)
run-server.bat          # Windows
# or
cargo run -p spectra-server

# The server listens on http://0.0.0.0:3000

Running Federated Agents (Phase 3)

# Connect agent to server (dry-run governance)
run-agent.bat           # Windows
# or
cargo run -p spectra-cli -- --path ./ --server http://localhost:3000

# Agent with active policy enforcement (⚠️ CAUTION: Can delete files)
cargo run -p spectra-cli -- --path ./ --server http://localhost:3000 --enforce

# Full stack: Analysis + Governance + Federation
cargo run -p spectra-cli -- --path ./ --server http://localhost:3000 --analyze

Time-Travel Analytics Demo (Phase 3.5) πŸ†•

# 1. Start the Spectra Server (required)
cd server
cargo run

# 2. Run the simulation script to generate test data
# Windows PowerShell:
.\test-time-travel.ps1

# Linux/macOS:
chmod +x test-time-travel.sh
./test-time-travel.sh

# 3. Launch the GUI and navigate to the "⏳ Time-Travel Analytics" tab
cd app
npm run dev

# 4. Explore the interactive timeline!
# - Agent ID: agent_sim_01
# - Use sliders to select time ranges
# - View velocity metrics and growth attribution

The simulation script creates 5 snapshots spanning 24 hours with realistic growth patterns:

  • Baseline: 1GB total
  • Log file accumulation: +500MB
  • Video spike: +500MB
  • Total velocity: ~11.5 KB/s average

See docs/TIME_TRAVEL_ANALYTICS.md for the complete guide.

πŸ”§ Scripts & Tools

Spectra includes several convenience scripts to streamline development and deployment workflows.

Quality Assurance

validate-refactor.bat (Windows)

  • Purpose: Pre-alpha comprehensive validation suite for modular architecture
  • What it does:
    • Code formatting check (cargo fmt --check)
    • Clippy linting with strict warnings (-D warnings)
    • Builds all crates (core, CLI, server, Tauri app) in release mode
    • Runs unit tests for each crate independently
    • Runs integration tests (CLI basic scan, CLI analysis scan)
    • Full workspace test suite
  • When to use: Before committing changes, during refactoring, or after dependency updates
  • Usage: validate-refactor.bat
  • Expected output: Color-coded step-by-step validation with βœ“/βœ— indicators

Build & Deployment

build-release.bat (Windows)

  • Purpose: Build production-ready release binaries for all crates
  • What it does: Runs cargo build --release for spectra-core, spectra-cli, spectra-server, and app
  • When to use: When preparing binaries for distribution or benchmarking
  • Usage: build-release.bat
  • Output location: target/release/

GUI Application

launch-vision.bat (Windows) / launch-vision.sh (Unix/Linux/macOS)

  • Purpose: One-click launcher for Spectra Dashboard GUI (Phase 4 - The Lens)
  • What it does:
    • Checks for dependencies (Node.js, Cargo)
    • Navigates to app/ directory
    • Installs npm dependencies if needed
    • Launches Tauri development server (npm run tauri dev)
  • When to use: To visualize directory scans with enterprise statistics dashboard
  • Usage:
    • Windows: launch-vision.bat
    • Unix: ./launch-vision.sh
  • Features: Opens native GUI with file statistics, top extensions, and heavy hitters visualization

Federation & Server

run-server.bat (Windows)

  • Purpose: Start the Spectra Server (Phase 3 - Federation Hub)
  • What it does: Runs cargo run -p spectra-server to start the central control plane
  • When to use: When testing federated agent coordination or time-travel analytics
  • Usage: run-server.bat
  • Endpoint: Listens on http://0.0.0.0:3000
  • API endpoints:
    • POST /api/v1/ingest - Receive agent snapshots
    • GET /api/v1/history/:agent_id - Get available timestamps πŸ†•
    • GET /api/v1/velocity/:agent_id - Calculate data velocity πŸ†•
    • GET /api/v1/policies - Distribute governance policies

run-agent.bat (Windows)

  • Purpose: Run federated CLI agent in dry-run mode
  • What it does: Executes cargo run -p spectra-cli -- --path ./ --server http://localhost:3000
  • When to use: Testing agent federation without policy enforcement
  • Usage: run-agent.bat (ensure server is running first)
  • Safety: Dry-run by default (reports governance actions without executing)
  • For enforcement: Edit script to add --enforce flag (⚠️ CAUTION: Can delete files)

Time-Travel Analytics Testing πŸ†•

test-time-travel.ps1 (Windows PowerShell)

  • Purpose: Simulate time-series data for Time-Travel Analytics testing
  • What it does:
    • Validates server connectivity
    • Injects 5 snapshots spanning 24 hours with realistic growth patterns
    • Verifies history and velocity endpoints
    • Provides detailed output with growth metrics
  • When to use: Testing the Time-Travel Analytics feature without waiting for real data
  • Usage: .\test-time-travel.ps1
  • Requirements: Spectra Server must be running on http://localhost:3000
  • Output: Creates agent agent_sim_01 with 1GB growth over 24 hours

test-time-travel.sh (Linux/macOS Bash)

  • Purpose: Same as PowerShell version but for Unix-like systems
  • What it does: Identical functionality to .ps1 script
  • When to use: Testing Time-Travel Analytics on Linux/macOS
  • Usage: chmod +x test-time-travel.sh && ./test-time-travel.sh
  • Requirements: curl and Spectra Server running
  • Output: Comprehensive test report with velocity calculations

Script Compatibility

Script Windows Unix/Linux macOS
validate-refactor.bat βœ… ❌ ❌
build-release.bat βœ… ❌ ❌
launch-vision.bat βœ… ❌ ❌
launch-vision.sh ❌ βœ… βœ…
run-server.bat βœ… ❌ ❌
run-agent.bat βœ… ❌ ❌
test-time-travel.ps1 βœ… ❌ ❌
test-time-travel.sh ❌ βœ… βœ…

Note: Unix/Linux/macOS users can run equivalent Cargo commands directly. See individual sections above for command equivalents.

πŸ— Architecture

Project Structure

spectra/
β”œβ”€β”€ spectra-core/              # πŸ†• Core scanning library (NEW)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── lib.rs            # Scanner, FileRecord, ScanStats
β”‚   β”œβ”€β”€ Cargo.toml
β”‚   └── README.md              # Usage guide
β”œβ”€β”€ cli/                        # Headless Rust agent (spectra-cli)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ main.rs            # Thin client using spectra-core
β”‚   β”‚   β”œβ”€β”€ analysis/          # Phase 2: Semantic Bridge
β”‚   β”‚   β”‚   β”œβ”€β”€ entropy.rs     # Shannon entropy calculation
β”‚   β”‚   β”‚   β”œβ”€β”€ heuristics.rs  # Risk pattern detection
β”‚   β”‚   β”‚   β”œβ”€β”€ semantic.rs    # AI content classification
β”‚   β”‚   β”‚   └── mod.rs         # Analysis module API
β”‚   β”‚   └── governance/        # Phase 3: Active Governance
β”‚   β”‚       β”œβ”€β”€ engine.rs      # Policy evaluation & execution
β”‚   β”‚       β”œβ”€β”€ tests.rs       # Safety test suite
β”‚   β”‚       └── mod.rs         # Governance module API
β”‚   └── Cargo.toml
β”œβ”€β”€ server/                     # Phase 3: Central Hub (spectra-server)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── main.rs            # Axum API server
β”‚   └── Cargo.toml
β”œβ”€β”€ app/                        # Phase 4: Tauri + React GUI (The Lens)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ types.ts           # TypeScript interfaces for scan results
β”‚   β”‚   β”œβ”€β”€ App.tsx            # Enterprise dashboard implementation
β”‚   β”‚   └── App.css            # Dark-themed enterprise styling
β”‚   β”œβ”€β”€ src-tauri/
β”‚   β”‚   └── src/
β”‚   β”‚       └── lib.rs         # Tauri commands using spectra-core
β”‚   β”œβ”€β”€ launch-spectra-vision.bat   # Windows launcher
β”‚   β”œβ”€β”€ launch-spectra-vision.sh    # Unix launcher
β”‚   β”œβ”€β”€ package.json
β”‚   └── README.md              # GUI documentation
β”œβ”€β”€ docs/                            # Documentation
β”‚   β”œβ”€β”€ ARCHITECTURE.md             # Detailed technical documentation
β”‚   β”œβ”€β”€ PHASE3_GUIDE.md             # Phase 3 quick start guide
β”‚   β”œβ”€β”€ PHASE4_GUIDE.md             # Phase 4 visualization guide
β”‚   β”œβ”€β”€ TIME_TRAVEL_ANALYTICS.md    # πŸ†• Time-Travel Analytics guide (NEW)
β”‚   β”œβ”€β”€ CONTRIBUTING.md             # Contribution guidelines
β”‚   β”œβ”€β”€ DEVELOPMENT.md              # Developer setup guide
β”‚   └── FAQ.md                      # Frequently asked questions
β”œβ”€β”€ Cargo.toml                      # Workspace manifest
β”œβ”€β”€ CHANGELOG.md                    # Version history
β”œβ”€β”€ validate-refactor.bat           # QA validation suite
β”œβ”€β”€ launch-vision.bat               # Launch GUI (Windows)
β”œβ”€β”€ launch-vision.sh                # Launch GUI (Unix)
β”œβ”€β”€ run-server.bat                  # Start Hub server (Windows)
β”œβ”€β”€ run-agent.bat                   # Run federated agent (Windows)
β”œβ”€β”€ build-release.bat               # Build all binaries (Windows)
β”œβ”€β”€ test-time-travel.ps1            # πŸ†• Time-Travel test script (Windows, NEW)
└── test-time-travel.sh             # πŸ†• Time-Travel test script (Unix, NEW)

πŸ—οΈ Modular Architecture (Pre-Alpha)

Spectra recently underwent a major refactoring to establish a clean modular architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ spectra-server  β”‚ ← Federation endpoint (Phase 3)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  spectra-cli    β”‚ ← Thin client
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ Analysis      β”‚ ← Phase 2 (entropy, risk, semantic)
β”‚ β€’ Governance    β”‚ ← Phase 3 (policies)
β”‚ β€’ Federation    β”‚ ← Phase 3 (server comms)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ uses
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ spectra-core    β”‚ ← Shared scanning engine ⭐ NEW
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ Scanner       β”‚ ← Phase 1 (jwalk, BinaryHeap)
β”‚ β€’ FileRecord    β”‚ ← Simple (path, size)
β”‚ β€’ ScanStats     β”‚ ← Aggregated results
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↑ uses
         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   app (Tauri)   β”‚ ← GUI application (Phase 4)
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ scan_directoryβ”‚ ← Statistics dashboard
β”‚ β€’ React + TS    β”‚ ← Strongly-typed frontend
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Benefits:

  • πŸ”„ Code reuse between CLI and GUI
  • βœ… Independently testable core library
  • 🎯 Clear separation of concerns
  • πŸš€ No performance regression
  • 🧩 Extensible for future use cases

Status: Pre-Alpha (API unstable)

Philosophy

Spectra adheres to the "Trojan Horse" philosophy:

  • The Hook: Solve the immediate "Disk Full" pain point with superior speed/UI.
  • The Pivot: Use the installed base to index content and map lineage.
  • The Platform: Federate metadata to a central governance plane.

See ARCHITECTURE.md for the deep dive.

🀝 Contributing

We prioritize Performance and Safety. If it slows down the scan, it doesn't get merged.

See CONTRIBUTING.md for protocols.

πŸ“„ License

Spectra is dual-licensed under your choice of:

This dual licensing provides maximum flexibility for both open-source and commercial use. You may choose whichever license best suits your needs.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

See LICENSE for more details on the dual licensing model.


Part of the Orbit/SPACE Data Infrastructure Initiative.

About

SPECTRA - Scalable Platform for Enterprise Content Topology & Resource Analytics

Resources

License

Apache-2.0 and 2 other licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published