-
Notifications
You must be signed in to change notification settings - Fork 0
Add VS Code extension for ARTK test automation #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This extension provides visual tools for ARTK test automation: - Installation wizard to replace CLI bootstrap scripts - ARTK Explorer sidebar with Status, Journeys, and LLKB views - Status bar showing installation status and LLKB health - Dashboard webview with overview and quick actions - Commands for doctor, check, upgrade, and LLKB operations - File watchers for automatic view updates - CLI integration layer to wrap @artk/cli commands The extension activates when it detects an artk-e2e directory or .artk/context.json file in the workspace. https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
Implements testable CLI commands for journey validation and implementation, replacing pseudocode from prompts with real TypeScript. Critical fixes (from multi-AI review): - CRIT-1: Parallel mode now explicitly rejected (not implemented yet) - CRIT-2: Session state persisted to .artk/session.json after each step - CRIT-3: Parse real test file names from AutoGen output with fallback High priority fixes: - Command timeout (5 min default) prevents hanging forever - LLKB re-export failures always warn in strict mode - Windows path handling (case-insensitive comparison) - Enhanced environment detection (Cursor, JetBrains, WSL, Docker, CI) Security hardening: - CommandSpec with shell: false prevents command injection - Path traversal protection via isPathSafe() - Zod schema validation with detailed error propagation - Duplicate journey detection across status folders New CLI commands: - artk journey validate <id> - Validate journey for implementation - artk journey implement <ids> - Generate tests for journey(s) - artk journey check-llkb - Verify LLKB configuration Test coverage: 202 tests passing Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Bootstrap scripts now create both .github/prompts/ and .github/agents/ - Prompts are stub files with agent: property that delegate to full agents - Agents contain full implementation with handoffs (suggested next actions) - Upgrade path: detects old full-content prompts, backs up, and migrates - Added comprehensive test suite (103 tests) for fresh install and upgrades - Updated documentation to explain two-tier architecture This enables clickable "Suggested Next Actions" buttons after prompt execution while maintaining the familiar /artk.* slash command invocation pattern. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Multi-AI code review identified critical issues and improvements: CRITICAL FIXES: - [C1] Update CLI installPrompts() to use two-tier architecture - Now creates both .github/prompts/ (stubs) and .github/agents/ (full content) - Handles upgrade detection and backup cleanup - [C2] Update uninstall to remove .github/agents/ along with prompts HIGH PRIORITY FIXES: - [H1] Add cross-reference validation in test suite - Verifies each stub's agent: property references existing agent file - [H2] Improve upgrade detection logic - Use grep -q for clean boolean check (fixes newline output issue) - Check for absence of agent: property instead of # ARTK pattern - [H3] Capture bootstrap output in tests for debugging MEDIUM PRIORITY FIXES: - [M1] Document variant-info.prompt.md as special case in docs - [M2] Add backup cleanup mechanism (keep only 3 most recent) - [M3] Normalize case-sensitivity (-cmatch in PowerShell) - [M4] Add atomic operations with staging directories and rollback Test Results: 116 passed, 0 failed Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
- Add LLKB seed command for pre-seeding universal patterns - Enhance journey-implement workflow with better decision tree - Update prompts for discover-foundation and journey-implement - Add research documents for architecture analysis and implementation plans Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Security fixes: - Add HTML escaping to prevent XSS in dashboard webview - Add Content Security Policy with nonce for inline scripts - Remove shell: true from spawn to prevent command injection - Add CLI path validation (must be absolute, exist, be a file) - Add output size limits to prevent memory exhaustion - Add SIGKILL fallback for hung processes UX improvements: - Add ARIA attributes for accessibility - Add visual indicators (checkmarks, warnings) for status - Add text overflow handling for long app names - Add focus styles for keyboard navigation - Add user-friendly error messages for common failures Also adds research document with full critical review findings. https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
Add vitest test framework with full test coverage: - CLI runner: security validation, command injection prevention, path validation - Dashboard: XSS prevention, HTML escaping, CSP nonce generation - Tree providers: async I/O, caching, loading states - Workspace detector: async/sync detection, config parsing - Init command: wizard flow, error handling, reinstall detection Also completes P0 security fixes: - Convert sync I/O to async in detector.ts, JourneysTreeProvider.ts, LLKBTreeProvider.ts - Add recursion depth limit (MAX_RECURSION_DEPTH=10, MAX_FILES_TO_SCAN=1000) Total: 141 tests passing https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
…l fixes Implements the Hybrid Agentic architecture where CLI provides granular, stateless tools and the orchestrating LLM serves as the "brain". New CLI Commands: - analyze: Parse journeys and output structured analysis - plan: Generate test plans with multi-sample strategy support - run: Execute tests with Playwright installation check - refine: Analyze failures and suggest refinements - status: Show pipeline state for orchestrator - clean: Clean artifacts and reset pipeline state Critical Fixes from Code Review: 1. Pipeline State Persistence - All commands update .artk/autogen/pipeline-state.json 2. Circuit Breaker Replay Bug - Now restores from saved state instead of replaying 3. Playwright Installation Check - Validates before test execution 4. Telemetry Wired Up - All commands track start/end/errors 5. Multi-Sample Strategy - plan.ts supports --strategy multi-sample 6. generateCodeFromPlan() Fixtures - Ensures 'page' is always included 7. Input Validation - Validates timeout/retries/strategy arguments New Modules: - src/pipeline/state.ts: Pipeline state management - src/refinement/: Circuit breaker, convergence detection, playwright runner - src/shared/telemetry.ts: Command telemetry tracking - src/uncertainty/multi-sampler.ts: Multi-sample code generation - src/scot/planner.ts: SCoT planning with orchestrator mode Note: TypeScript strict mode issues in new files require follow-up fix. All 1090 unit tests passing. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Dashboard Integration: - Add LLKB Seed button to seed universal patterns - Add message handlers for llkbSeed and journeyValidate commands - Expand LLKB card title for clarity CLI Runner Extensions: - Add llkbSeed() function for seeding patterns - Add journeyValidate() for validating journeys - Add journeyCheckLlkb() for LLKB readiness check - Add journeyImplement() for test generation - Add new types: CLILLKBSeedOptions, CLIJourneyValidateOptions, etc. Command Registration: - Register artk.llkb.seed command - Export runLLKBSeed from llkb.ts Research: - Add dashboard integration analysis document https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
Add null checks and fallbacks for: - Regex match group access (&& match[1] patterns) - Array index access (?? fallbacks and ! assertions) - Optional chaining for potentially undefined values - Type-safe default values for ConfidenceDimension Files fixed: - CLI: analyze.ts, plan.ts, run.ts - Refinement: convergence-detector.ts, playwright-runner.ts, refinement-loop.ts - SCoT: parser.ts (line array access guards) - Uncertainty: confidence-scorer.ts, multi-sampler.ts TypeScript --noEmit now passes cleanly. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
…board - Add llkbStatsJson and journeySummary CLI functions for JSON output - Update DashboardPanel to fetch and display LLKB statistics inline (lessons count, components count, avg confidence, last updated) - Add Journey Summary card showing status breakdown by lifecycle state - Add formatDate helper for human-readable date display - Add comprehensive tests for new CLI functions and dashboard features https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
…actions Tier 2 dashboard features: - Add SessionState type for tracking implementation progress - Add readSessionState() to read session.json directly for fast polling - Add Implementation Progress card with: - Real-time progress bar during journey implementation - Journey status list (completed/current/failed) - Elapsed time display - Current step indicator - Add Journey Quick Actions: - "View Journeys" button to open journey files - "Validate All" button - "Implement Ready (N)" button for quick implementation - Add polling mechanism (2s interval) during active implementation - Add formatElapsed helper for time display - Add comprehensive tests for new features (172 total tests) https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
Post-review fixes addressing critical issues identified in the AutoGen
Enhancement implementation review. All changes have been tested and pass.
## Changes
1. **LLKB Default to Enabled** (config/schema.ts, config/loader.ts)
- Changed `llkb.enabled` default from `false` to `true`
- LLKB is now always on by default as intended
2. **status.ts Field Name Bug** (cli/status.ts)
- Fixed `circuitBreaker.isOpen` to `circuitBreakerState.isOpen`
- Added null coalescing for safety
3. **Renamed checkLimit() to isUnderBudget()** (shared/cost-tracker.ts)
- Clearer naming: returns true when UNDER budget (can continue)
- Kept checkLimit() as deprecated alias for backward compatibility
4. **Truncation Indicators** (cli/run.ts)
- Added `[STDOUT/STDERR TRUNCATED - N more characters]` indicator
- Prevents silent data loss when output exceeds 10KB
5. **Skip Escalation Loophole** (refinement/refinement-loop.ts)
- Added `consecutiveSkips` counter with MAX_CONSECUTIVE_SKIPS = 3
- Prevents infinite loops when LLM only generates low-confidence fixes
6. **Multi-sampling Clarity** (uncertainty/confidence-scorer.ts)
- Added clear warning when multi-sampling is enabled but only single
code is passed
- Documented that calculateConfidenceWithSamples() should be used
for agreement scoring
7. **Minimum Test Suite** (tests/cli/, tests/pipeline/)
- Added tests/cli/analyze.test.ts (8 tests)
- Added tests/cli/plan.test.ts (17 tests)
- Added tests/pipeline/state.test.ts (15 tests)
- All CLI commands now have basic test coverage
8. **JSON Reporter for Playwright** (refinement/playwright-runner.ts)
- Added proper Playwright JSON reporter parsing
- Primary: Parse structured JSON output (more reliable)
- Fallback: Regex parsing of stdout/stderr (backward compatible)
- Temp JSON files cleaned up after parsing
## Test Results
- All 52 autogen test files pass (1125 tests)
- All 91 core typescript test files pass (2384 tests)
- Total: 143 test files, 3509 tests passing
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Add workflow to build, test, and package VSIX on push/PR - Upload VSIX as downloadable artifact (30-day retention) - Auto-create GitHub releases on main branch pushes - Remove icon reference (PNG not available, optional for local install) To download the VSIX: 1. Go to Actions → Build VS Code Extension 2. Click on a successful run 3. Download "artk-vscode-extension" artifact https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
- Auto-increment patch version on each push to main - Support manual version bump selection (patch/minor/major/none) - Commit version changes back to repo with [skip ci] - Include version in artifact names for clarity - Simplified release job using build outputs https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
P0 (Critical): - P0-1: Remove fake 0.7 agreement score in confidence-scorer.ts - P0-2: Fix LLKB config merge to preserve partial configs in loader.ts - P0-3: Remove shell:true from all spawn calls (run.ts, runner.ts, playwright-runner.ts) P1 (High): - P1-1: Add 14 tests for run.ts CLI command - P1-2: Add 12 tests for refine.ts CLI command - P1-3: Add 8 tests for clean.ts CLI command - P1-4: Rename duplicate PipelineState to OrchestrationPhase in types.ts - P1-5: Add circuit breaker tracking for LLM exceptions in refinement-loop.ts P2 (Medium): - P2-1: Add UTF-8 safe truncation (surrogate pair handling) in run.ts - P2-2: Use atomic file writes (temp+rename) in pipeline/state.ts Total: 34 new tests added, 1159 tests passing across 55 files. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Fixes critical issues identified in code review: - Add artk.llkb.seed to package.json commands - Create artk.journey.validate command handler - Create artk.journey.implement command handler - Register journey commands in commands/index.ts These commands were referenced in the dashboard but never registered, causing features to be unreachable from the UI. https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
The GitHub Actions workflow requires package-lock.json to exist for npm caching to work. Added exception to root .gitignore to allow committing the VS Code extension's lock file. https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive VS Code extension for ARTK (Automatic Regression Testing Kit), providing visual tools and an integrated UI for managing Playwright-based test automation projects. The extension implements a hybrid agentic architecture with pipeline commands (analyze, plan, generate, run, refine) and integrates with the LLKB (Lessons Learned Knowledge Base) system. The implementation includes both direct test generation and a multi-stage pipeline approach for complex test scenarios.
Changes:
- Added complete VS Code extension with activation hooks, command registration, and lifecycle management
- Implemented CLI integration layer for ARTK operations with progress tracking and error management
- Created explorer views (Status, Journeys, LLKB) and dashboard webview panel for UI interaction
- Added pipeline state management with atomic file operations and telemetry tracking
- Implemented LLKB integration with default-enabled configuration and storage layer for refinement lessons
- Updated documentation to reflect two-tier prompt/agent architecture and AutoGen pipeline workflow
- Added GitHub Actions workflow for automated extension building, testing, and releases
Reviewed changes
Copilot reviewed 54 out of 209 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/vscode-settings-management.md | Documents two-tier prompt/agent architecture and variant-info special case |
| core/typescript/autogen/tests/pipeline/state.test.ts | Tests pipeline state management including load/save, transitions, and convergence tracking |
| core/typescript/autogen/tests/integration/llkb-integration.test.ts | Updates LLKB default to enabled (true) across integration tests |
| core/typescript/autogen/tests/cli/run.test.ts | Tests CLI run command with argument parsing, test execution, and error handling |
| core/typescript/autogen/tests/cli/refine.test.ts | Tests CLI refine command for failure analysis and refinement suggestions |
| core/typescript/autogen/tests/cli/plan.test.ts | Tests CLI plan command for test generation planning from analysis |
| core/typescript/autogen/tests/cli/clean.test.ts | Tests CLI clean command for artifact cleanup with keep options |
| core/typescript/autogen/tests/cli/analyze.test.ts | Tests CLI analyze command for journey parsing and analysis output |
| core/typescript/autogen/src/verify/runner.ts | Removes shell:true to prevent command injection vulnerabilities |
| core/typescript/autogen/src/utils/paths.ts | Adds autogen artifact path utilities with atomic file operations |
| core/typescript/autogen/src/uncertainty/types.ts | Defines types for uncertainty quantification confidence scoring |
| core/typescript/autogen/src/uncertainty/index.ts | Exports uncertainty quantification modules for confidence analysis |
| core/typescript/autogen/src/shared/types.ts | Defines shared types for LLM, cost tracking, pipeline orchestration, and LLKB |
| core/typescript/autogen/src/shared/telemetry.ts | Implements telemetry system for tracking performance, costs, and errors |
| core/typescript/autogen/src/shared/llm-response-parser.ts | Implements JSON extraction and validation from LLM responses |
| core/typescript/autogen/src/shared/index.ts | Exports shared infrastructure for enhancement strategies |
| core/typescript/autogen/src/shared/cost-tracker.ts | Implements cost tracking with limits to prevent cost explosion |
| core/typescript/autogen/src/shared/config-validator.ts | Validates enhancement config with LLM availability checks |
| core/typescript/autogen/src/scot/validator.ts | Validates SCoT plans for correctness and completeness |
| core/typescript/autogen/src/scot/types.ts | Defines types for Structured Chain-of-Thought planning |
| core/typescript/autogen/src/scot/prompts.ts | Provides LLM prompt templates for SCoT planning |
| core/typescript/autogen/src/scot/planner.ts | Implements SCoT plan generation with orchestrator mode support |
| core/typescript/autogen/src/scot/index.ts | Exports SCoT planning modules |
| core/typescript/autogen/src/refinement/types.ts | Defines types for self-refinement strategy |
| core/typescript/autogen/src/refinement/llkb-storage.ts | Implements LLKB storage for refinement lessons |
| core/typescript/autogen/src/refinement/llkb-learning.ts | Implements lesson extraction and confidence adjustment |
| core/typescript/autogen/src/refinement/index.ts | Exports self-refinement strategy modules |
| core/typescript/autogen/src/pipeline/state.ts | Implements pipeline state management with atomic writes |
| core/typescript/autogen/src/pipeline/index.ts | Exports pipeline state management |
| core/typescript/autogen/src/index.ts | Adds exports for enhancement strategies and path utilities |
| core/typescript/autogen/src/config/schema.ts | Changes LLKB default from false to true |
| core/typescript/autogen/src/config/loader.ts | Updates LLKB config migration to preserve user settings |
| core/typescript/autogen/src/cli/status.ts | Implements status command showing pipeline state |
| core/typescript/autogen/src/cli/index.ts | Adds pipeline commands to CLI with reorganized help text |
| core/typescript/autogen/src/cli/generate.ts | Adds plan-based generation mode alongside legacy direct generation |
| core/typescript/autogen/src/cli/clean.ts | Implements clean command for artifact cleanup |
| core/typescript/autogen/src/cli/analyze.ts | Implements analyze command for journey analysis |
| CLAUDE.md | Documents two-tier architecture and AutoGen CLI commands |
| .github/workflows/build-vscode-extension.yml | Adds GitHub Actions workflow for extension CI/CD |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let stderr = ''; | ||
|
|
||
| // SECURITY: shell: false (default) prevents command injection via args | ||
| // Node.js v14.18+ handles .cmd/.bat files on Windows automatically |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of shell: true is a security improvement to prevent command injection. However, the comment on line 259 references Node.js v14.18+ which is outdated given the current date context (February 2026). Node.js 14 reached end-of-life in April 2023. The comment should reference a currently supported Node.js version or state the minimum version required by the project.
| // Node.js v14.18+ handles .cmd/.bat files on Windows automatically | |
| // Modern supported Node.js versions handle .cmd/.bat files on Windows automatically |
| if (fixtureSet.size > 0 && !fixtureSet.has('page')) { | ||
| fixtureSet.add('page'); | ||
| } | ||
| const fixtureList = fixtureSet.size > 0 ? Array.from(fixtureSet) : ['page']; | ||
| const fixtures = `{ ${fixtureList.join(', ')} }`; | ||
|
|
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition fixtureSet.size > 0 && !fixtureSet.has('page') means that if plan.fixtures is empty, 'page' won't be added automatically. Line 178 sets fixtureList to ['page'] as a fallback, but this logic could be clearer by always ensuring 'page' is present rather than having two different paths that both result in including 'page'.
| if (fixtureSet.size > 0 && !fixtureSet.has('page')) { | |
| fixtureSet.add('page'); | |
| } | |
| const fixtureList = fixtureSet.size > 0 ? Array.from(fixtureSet) : ['page']; | |
| const fixtures = `{ ${fixtureList.join(', ')} }`; | |
| fixtureSet.add('page'); | |
| const fixtureList = Array.from(fixtureSet); | |
| const fixtures = `{ ${fixtureList.join(', ')} }`; |
| // for clarity and to handle edge cases where partial configs are loaded | ||
| if (config.llkb === undefined) { | ||
| // IMPORTANT: We MERGE with defaults, not replace, to preserve user's explicit settings | ||
| if (config.llkb === undefined || config.llkb === null) { |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null check on line 309 is redundant with the schema's .default({}) behavior. Zod schemas with .default() will never produce null values - they will either be the parsed object or the default object. This check should only need to test for undefined.
| if (config.llkb === undefined || config.llkb === null) { | |
| if (config.llkb === undefined) { |
Summary
This PR introduces a comprehensive VS Code extension for ARTK (Automatic Regression Testing Kit), providing visual tools and an integrated UI for managing Playwright-based test automation projects.
Key Changes
Extension Core: Complete VS Code extension implementation with activation hooks, command registration, and lifecycle management
Installation Wizard: Multi-step guided setup wizard for initializing ARTK in projects with:
CLI Integration Layer: Robust command execution system for running ARTK CLI commands with:
Explorer Views: Sidebar UI with three main views:
Commands: 13 registered commands covering:
Settings: Configurable extension behavior including:
Build Configuration: esbuild-based bundling with source maps and tree-shaking for optimized distribution
Implementation Details
Files Added
https://claude.ai/code/session_01WjUMcG34n6oe5MULEPECb3