Skip to content

♻️ SQLite state foundation with migrations and full metadata parity#231

Open
Robdel12 wants to merge 5 commits intomainfrom
feat/sqlite-state
Open

♻️ SQLite state foundation with migrations and full metadata parity#231
Robdel12 wants to merge 5 commits intomainfrom
feat/sqlite-state

Conversation

@Robdel12
Copy link
Contributor

@Robdel12 Robdel12 commented Feb 19, 2026

Why

This refactor is about making local TDD state durable and predictable as the CLI grows.

The old JSON files worked for early usage, but they were starting to create complexity around consistency, migration, and multi-process behavior. SQLite gives us a real schema, safer state evolution, and cleaner concurrency boundaries.

What changed

  • Switched local TDD/reporter state to a SQLite-backed store with schema migrations
    • src/tdd/state-store/sqlite-store.js
    • migration runner in src/tdd/state-store/migrations.js
  • Introduced explicit state-store modes
    • mode: 'write' for the TDD writer path
    • mode: 'read' for routers/services/commands that should never mutate state
    • write APIs now throw in read mode
  • Wired callsites to match that contract
    • writer: src/server/handlers/tdd-handler.js
    • readers: dashboard/events/health/static-report/context
  • Completed metadata parity on SQLite
    • baseline metadata
    • hotspot metadata
    • region metadata
    • baseline build metadata
  • Finalized migration behavior
    • one-time legacy JSON migration when state.db is created for the first time
    • bootstrap migration runs on CLI startup when needed (legacy files present + no state.db)
    • no repeated migration runs once DB exists
  • Fixed reset semantics
    • baseline reset now clears persisted state and in-memory TDD runtime caches
  • Updated SSE route behavior
    • subscription is read-only
    • snapshots read from a fresh read store to avoid stale/missing-db edge cases
  • Added/updated tests for mode contract, migration bootstrap behavior, and updated read/write callsites

Follow-up fixes from re-review

  • Server registry migration now avoids marking legacy_json_migrated on parse/read failure, so migration can retry on a later invocation.
  • Server registry write() no longer does dead in-transaction deletes after a full table clear.
  • Port allocation now returns null on exhaustion, and tdd start shows a clear actionable error instead of failing later with a bind error.
  • Dashboard and health read routes now guard state store creation so DB open failures are caught and handled predictably.
  • resetBaselines now uses the existing safeDeleteFile helper (and injected unlinkSync) instead of dynamic node:fs imports.
  • Added focused regression tests for all of the above paths.

Migration behavior (final)

  1. On any CLI invocation, we check the current project for legacy state.
  2. If .vizzly/ exists, state.db does not exist, and legacy state files exist, we bootstrap migration once.
  3. After state.db exists, migration does not run again.

This keeps startup safe and predictable, and avoids hidden write behavior in read-only code paths.

Test plan

  • npm run lint
  • node --test tests/tdd/state-store-mode.test.js tests/tdd/metadata/baseline-metadata.test.js tests/tdd/metadata/hotspot-metadata.test.js tests/tdd/metadata/region-metadata.test.js tests/server/routers/events.test.js tests/tdd/tdd-service.integration.test.js
  • node --test tests/server/handlers/tdd-handler.test.js tests/server/http-server.test.js tests/server/routers/dashboard.test.js tests/tdd/tdd-service.test.js tests/utils/context.test.js
  • node --test tests/tdd/server-registry.test.js tests/server/routers/dashboard.test.js tests/server/routers/health.test.js tests/server/handlers/tdd-handler.test.js

Notes:

  • Full npm test in this sandbox still reports existing tests/utils/colors.test.js color assertions (TTY/env behavior). The SQLite/state suites above are green.

@Robdel12 Robdel12 changed the title SQLite state foundation with migrations and full metadata parity ♻️ SQLite state foundation with migrations and full metadata parity Feb 19, 2026
@vizzly-testing

This comment has been minimized.

@claude
Copy link

claude bot commented Feb 19, 2026

PR Review: SQLite state foundation with migrations and full metadata parity

Overall this is a solid, well-scoped migration. The schema versioning story is clean, the file/SQLite backends share a coherent interface, and the dependency injection pattern preserves testability throughout. A few things worth addressing before merge:


Performance: store opened/closed per HTTP request

In dashboard.js and health.js every route handler calls createStateStore(...) / stateStore.close(). Each open runs applySchemaMigrations (queries schema_migrations) plus two legacy-migration guard reads before doing the actual work. Under active TDD polling this adds meaningful overhead.

The events router shows the right pattern — create one store, subscribe, close on disconnect. Dashboard and health should receive the store as a dependency (injected at server startup) the same way the TDD handler does with injectedStateStore.


Per-operation store opening in metadata modules

baseline-metadata.js, hotspot-metadata.js, and region-metadata.js all use:

function withStateStore(workingDir, operation) {
  let store = createStateStore({ workingDir });
  try { return operation(store); }
  finally { store.close(); }
}

Every loadBaselineMetadata / saveBaselineMetadata call opens and closes the DB, triggering migration checks each time. During screenshot processing this is called in a hot path. The store should be injected or cached at the call site rather than created per-operation.


Module-level stateEmitters Map never cleaned up

let stateEmitters = new Map();   // src/tdd/state-store.js

Entries are added by workingDir but never removed. In tests (and hypothetically in long-running daemon mode cycling across projects) this leaks EventEmitter instances. Adding a stateEmitters.delete(workingDir) in close() — or exposing a clearEmitter(workingDir) helper for tests — would fix this.


findAvailablePort fallback silently returns an occupied port

// src/tdd/server-registry.js
for (let i = 0; i < maxAttempts; i++) { ... }
return startPort;   // ← returned even when startPort was occupied

If all 100 candidates are occupied the function returns startPort unconditionally. The server will then fail to bind with a confusing error. Returning null or throwing here would surface the real problem earlier.


Legacy migration marks completion even on partial failure

// src/tdd/state-store.js — maybeMigrateLegacyJson
} catch (error) {
  output.debug?.('state', `legacy report JSON migration skipped: ${error.message}`);
} finally {
  setKey('legacy_json_migrated', '1');    // ← set even on error
}

A transient read error (permissions, disk full, malformed JSON mid-write) silently discards the legacy data permanently. Moving the setKey call to the success path, or only marking it done after a successful replaceReportDataInternal, would avoid silent data loss.


createSqliteStateStore — no error handling around DB open

let db = new DatabaseImpl(resolvedDbPath);

A locked, corrupted, or read-only DB throws synchronously here with no try/catch. Since this now runs inside HTTP request handlers (dashboard, health), an unhandled throw will crash the request or the process. At minimum, let it propagate with a clear message; ideally surface it as a 500 with a log line.


Minor nits

  • buildSummary makes four .filter() passes over the same array; a single reduce would be cleaner for larger comparison sets (not a blocker).
  • write() in ServerRegistry uses delete-all + re-insert inside a transaction rather than upsert. Correct, but more disruptive than needed for the register use-case it's adjacent to.
  • The engines constraint in better-sqlite3 (20.x || 22.x || ...) should be reflected in the root package.json engines field if it isn't already, so consumers get a clear error on unsupported runtimes.

What's working well

  • Migration versioning with schema_migrations + idempotent IF NOT EXISTS DDL is solid.
  • WAL mode, synchronous = NORMAL, busy_timeout = 5000, and foreign_keys = ON are all the right pragmas for this use-case.
  • Transactions wrapping every multi-step mutation — good consistency guarantees.
  • ON DELETE CASCADE on comparison_details provides a safety net even though the code also deletes explicitly.
  • The dual-backend design (SQLite in production, file in tests) keeps the test suite fast without mocking SQLite.
  • close() guards (if (!this.db) return) and silent-swallow in the catch are appropriate for cleanup paths.
  • The subscribe/unsubscribe pattern through the module-level EventEmitter correctly decouples DB lifecycle from SSE connection lifecycle.

@vizzly-testing

This comment has been minimized.

@vizzly-testing

This comment has been minimized.

@vizzly-testing

This comment has been minimized.

Introduce schema-versioned SQLite migrations for reporter state and metadata.\nPort baseline/hotspot/region/baseline-build metadata and server registry off JSON files with one-time legacy imports.\nUpdate routers, context, handlers, and tests to read/write through the DB-backed state store.
Break the monolithic state store into a small facade plus dedicated files for schema migrations, sqlite backend, file backend, events, constants, and shared helpers. This keeps DB evolution isolated and gives us a cleaner foundation for future migrations.
- remove file state-store fallback and backend toggles\n- tighten registry/event/state utilities for cleaner behavior\n- move handler/service coverage to real sqlite-backed tests
Introduce explicit read/write store modes, enforce a single writer path, and add one-time legacy JSON bootstrap on CLI startup.\n\nAlso updates routers/services/tests for read-mode access, resets in-memory TDD runtime state on baseline reset, and adds migration/mode contract coverage.
@Robdel12 Robdel12 force-pushed the feat/sqlite-state branch from 826ba94 to 7c9dfbb Compare March 1, 2026 03:18
@Robdel12 Robdel12 marked this pull request as ready for review March 1, 2026 03:18
@vizzly-testing
Copy link

vizzly-testing bot commented Mar 1, 2026

Vizzly - Visual Test Results

CLI Reporter - Approved

19 comparisons, no changes detected.

View build

CLI TUI - 2 changes need review
Status Count
Changed 2
Auto-approved 3
Changes needing review (2)

vizzly-tdd-help · 1202×578 · 14.7% diff

vizzly-tdd-help

vizzly-help · 1202×1430 · 3.0% diff

vizzly-help

Review changes


feat/sqlite-state · 7c9dfbbe

@Robdel12
Copy link
Contributor Author

Robdel12 commented Mar 1, 2026

@claude rereview this PR please

@claude
Copy link

claude bot commented Mar 1, 2026

Claude finished @Robdel12's task in 3m 33s —— View job


PR Re-Review

  • Gather context
  • Read key changed files
  • Analyze against previous review concerns
  • Check new code for additional issues
  • Post updated review

Good progress since the February review. Two of the six previous issues are cleanly resolved, one is effectively resolved with a smarter approach, and a couple of new issues emerged in the latest commits. Here's the full picture:

Previous Issues: Status

✅ Fixed: stateEmitters Map leak (src/tdd/state-store/events.js)

The unsubscribe function now deletes the emitter from the Map when the listener count drops to zero:

if (emitter.listenerCount('changed') === 0) {
  stateEmitters.delete(workingDir);
}

Clean fix.

✅ Fixed: Legacy migration marks done on partial failure (state store)

The state store no longer uses a KV flag to mark migration complete. Completion is implicit from !dbExistedBeforeOpen — if the DB didn't exist before open, migration runs once; after that it's a no-op because the file exists. Much better approach.

⚠️ Still present: Legacy migration marks done on partial failure (ServerRegistry)

server-registry.js line 212 still sets legacy_json_migrated = '1' in the finally block, regardless of whether the migration succeeded. A transient JSON parse error or filesystem issue would permanently discard the legacy registry data. The state store's approach of using DB existence as the guard would apply equally well here.

Fix this →

⚠️ Partially improved: Per-request store creation in dashboard.js and health.js

Still opens/closes a store on every request, but the overhead is meaningfully reduced: read-mode no longer runs applySchemaMigrations (no CREATE TABLE IF NOT EXISTS queries on open — just busy_timeout). For a localhost polling dashboard this is probably acceptable, but injecting a shared read store at server startup would be the cleaner long-term design.

❌ Still present: findAvailablePort returns startPort on exhaustion (server-registry.js line 476)

for (let i = 0; i < maxAttempts; i++) { ... }
return startPort;  // ← returned even when startPort was occupied

If all 100 candidates are occupied, the function silently returns startPort and the server bind fails with a confusing EADDRINUSE. Should return null or throw. Fix this →%20or%20return%20null%2C%20and%20update%20callers%20to%20handle%20the%20null%20case.%20Branch%3A%20feat%2Fsqlite-state&repo=vizzly-testing/cli)

❌ Still present: No error handling around DB open in read paths

dashboard.js calls createStateStore() before the try block on lines 29, 61, and 97. If the DB file is corrupted or locked, the synchronous new BetterSqlite3(path, { readonly: true }) throws and the exception bubbles up through the HTTP request handler uncaught:

// line 29 — outside try/catch
let stateStore = createStateStore({ workingDir, output, mode: 'read' });
try {
  let data = stateStore.readReportData();
  ...
} finally {
  stateStore.close();
}

Move createStateStore inside the try block (or wrap with a top-level catch that returns a 500). Fix this →


New Issues (in commits since the February review)

resetBaselines bypasses injected unlinkSync dep (tdd-handler.js lines 676, 696)

The handler already receives unlinkSync via dependency injection (deps.unlinkSync = defaultUnlinkSync), and safeDeleteFile correctly uses it. But resetBaselines calls await import('node:fs') inside a loop to get unlinkSync:

// lines 676, 696 — inside resetBaselines loop
const { unlinkSync } = await import('node:fs');
unlinkSync(baselinePath);

This bypasses the injected dep, making the reset path untestable with the same mock infrastructure as the rest of the handler. And the safeDeleteFile helper exists in the same closure for exactly this purpose:

// safeDeleteFile correctly uses injected unlinkSync
const safeDeleteFile = (imagePath, label, name) => {
  ...
  unlinkSync(filePath);  // ← the injected one
};

The resetBaselines deletion loops should use safeDeleteFile (which also adds safePath validation for free), eliminating the dynamic imports entirely. [Fix this →](https://claude.ai/code?q=In%20src%2Fserver%2Fhandlers%2Ftdd-handler.js%20the%20resetBaselines%20function%20(around%20line%20656)%20uses%20'await%20import('node%3Afs')' inside%20loops%20to%20get%20unlinkSync%2C%20bypassing%20the%20injected%20dep%20and%20the%20safeDeleteFile%20helper.%20Replace%20the%20inline%20deletion%20blocks%20in%20resetBaselines%20with%20calls%20to%20safeDeleteFile(comparison.baseline%2C%20'baseline'%2C%20comparison.name)%2C%20safeDeleteFile(comparison.current%2C%20'current'%2C%20comparison.name)%2C%20and%20safeDeleteFile(comparison.diff%2C%20'diff'%2C%20comparison.name).%20Branch%3A%20feat%2Fsqlite-state&repo=vizzly-testing/cli)

Dead code in write() transaction (server-registry.js lines 246–265)

The write() transaction opens with DELETE FROM registry_servers (line 251), clearing all rows. Then for each server it calls removeExisting.run(port, server.directory) (line 265) — but the table is already empty, so this is always a no-op. The removeExisting statement can be removed from the write() transaction entirely. Fix this →


What's working well

  • The read-only DB flag ({ readonly: true, fileMustExist: true }) on the read path is a solid safety boundary — the server can never accidentally mutate state from a read route.
  • shouldRunLegacyMigration = !dbExistedBeforeOpen is an elegant way to avoid the flag-on-failure problem for the state store.
  • WAL + synchronous = NORMAL + busy_timeout = 5000 remains the right set of pragmas.
  • The mode: 'write' / mode: 'read' enforcement with assertWriteMode on every mutating method is solid.
  • The no-op stub returned when the DB doesn't exist yet is a clean way to handle the first-run case without conditional logic scattered across callers.
  • createHotspotCache() / createRegionCache() as explicit cache primitives is a nice pattern that avoids the per-screenshot open overhead on the hot path.
  • The events.js emitter lifecycle is now correct end-to-end.

@Robdel12 Robdel12 changed the base branch from main to beta March 1, 2026 03:21
Fixes server registry migration retry/port exhaustion handling, read-path DB open guards, and resetBaselines dependency-safe file deletion with coverage.
@Robdel12 Robdel12 changed the base branch from beta to main March 1, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant