Skip to content

feat(mcp): optimize token consumption in MCP responses#128

Open
nioasoft wants to merge 4 commits intoAutoForgeAI:masterfrom
nioasoft:feat/token-optimization
Open

feat(mcp): optimize token consumption in MCP responses#128
nioasoft wants to merge 4 commits intoAutoForgeAI:masterfrom
nioasoft:feat/token-optimization

Conversation

@nioasoft
Copy link

@nioasoft nioasoft commented Jan 29, 2026

Summary

Reduces token consumption across three main areas:

  • MCP Response Optimization: Add minimal serialization methods and use them in high-volume tools
  • App Spec Summary Tool: New spec_get_summary tool returns condensed project spec (~800 tokens vs ~12,500)
  • Assistant Chat History: Progressive summarization reduces token usage by ~50%

Changes

Phase 1: MCP Response Optimization (Highest Impact)

api/models.py:

  • Added to_minimal_dict() - Returns only essential fields (~80% smaller than to_dict())
  • Added to_cycle_check_dict() - Returns only id and dependencies (~95% smaller)

mcp_server/feature_mcp.py:

  • feature_add_dependency() - Uses to_cycle_check_dict() for cycle checking
  • feature_set_dependencies() - Uses to_cycle_check_dict() for cycle checking
  • feature_get_ready() - Added minimal=True parameter (default)
  • feature_get_blocked() - Added minimal=True parameter (default)
  • feature_get_graph() - Queries only needed columns

Phase 2: App Spec Summary Tool

  • Added spec_get_summary() MCP tool that returns:
    • project_name
    • overview (truncated to 200 chars)
    • technology_stack
    • ports
    • feature_count

Phase 3: Assistant Chat History Optimization

server/services/assistant_chat_session.py:

  • Progressive history summarization:
    • Recent messages (last 5): up to 1500 chars
    • Older messages (6-20): 100-char summaries
    • Messages 21+: omitted

Documentation

  • Updated coding prompt to recommend new token-efficient tools

Expected Token Savings

Source Savings
Cycle check operations ~95%
feature_get_ready/blocked ~80%
feature_get_graph ~80%
App spec reading ~94%
Assistant chat history ~50%

Test plan

  • All 58 tests pass
  • Ruff linting passes on modified files
  • New minimal serialization methods work correctly

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a project-spec summary tool for concise project context.
    • Added an option to return minimal feature data to reduce payload size.
  • Behavior Changes

    • Conversation history now uses progressive summarization with a shorter recent window.
    • Testing workflow simplified: release-testing removed; sessions now log results and focus on commit-driven iteration.
  • Performance & Optimization

    • Token-efficient feature queries, optimized graph loading, and improved dependency checks.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Adds token-efficient feature serialization and cycle-check helpers, introduces a new public tool spec_get_summary to condense app_spec.txt, makes MCP feature list APIs optionally minimal, reduces chat history window with progressive summarization, and simplifies testing prompts and client tools by removing explicit release-testing actions.

Changes

Cohort / File(s) Summary
Prompt Template & Guidance
​.claude/templates/coding_prompt.template.md
Remove instruction to read app_spec.txt; add spec_get_summary step; add a dedicated TOKEN EFFICIENCY block and update orientation flow to recommend spec_get_summary for project context.
Testing Prompt & Client Tools
​.claude/templates/testing_prompt.template.md, client.py
Rework test orchestration to logging-and-commit (remove explicit release flow); remove mcp__features__feature_release_testing from FEATURE_MCP_TOOLS.
Database Model Serialization
api/database.py
Add Feature.to_minimal_dict() for compact serialization and Feature.to_cycle_check_dict() for cycle-detection payloads.
MCP Server Tools & APIs
mcp_server/feature_mcp.py
Add public spec_get_summary() tool that reads app_spec.txt and returns condensed JSON; make feature_get_ready and feature_get_blocked optionally return minimal records (minimal: bool = True); optimize feature_get_graph query to fetch essential columns; use to_cycle_check_dict() for cycle checks in dependency-setting endpoints.
Chat History Management
server/services/assistant_chat_session.py
Reduce history window from 35→20 messages; implement progressive summarization (last 5 up to 1500 chars, messages 6–20 summarized/truncated to 100 chars); update logging and resume-loading flow.
Templates / Manifests
.claude/templates/*, manifest analyzer files
Small additions and relocations of token-efficiency guidance and minor manifest analyzer tweaks (lines changed noted in summaries).

Sequence Diagram(s)

sequenceDiagram
    participant Assistant as Assistant
    participant MCP_Server as MCP Server
    participant FileSystem as File System
    participant Database as Database

    Assistant->>MCP_Server: call spec_get_summary()
    MCP_Server->>FileSystem: read `app_spec.txt`
    FileSystem-->>MCP_Server: raw spec text
    MCP_Server->>MCP_Server: parse & condense (project_name, overview, stack, ports, feature_count)
    MCP_Server-->>Assistant: return JSON summary

    Assistant->>MCP_Server: call feature_get_ready(minimal=true)
    MCP_Server->>Database: query essential feature columns
    Database-->>MCP_Server: feature rows
    MCP_Server->>MCP_Server: serialize with to_minimal_dict()
    MCP_Server-->>Assistant: minimal feature list

    Note over MCP_Server,Database: Dependency updates use to_cycle_check_dict() for cycle checks
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I nibble big specs into neat bites,

I hop through deps and tame long nights,
Tokens trimmed, summaries in hand,
Leaner trails across the land,
A rabbit’s cheer for tidy plans!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(mcp): optimize token consumption in MCP responses' accurately and concisely summarizes the primary change—reducing token usage across MCP feature responses through serialization optimization, new tools, and history summarization.
Docstring Coverage ✅ Passed Docstring coverage is 98.18% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Add to_minimal_dict() and to_cycle_check_dict() to Feature model

- Use minimal serialization for cycle detection (~95% token reduction)

- Add minimal parameter to feature_get_ready/blocked (default True)

- Optimize feature_get_graph to query only needed columns

- Add spec_get_summary MCP tool (~800 tokens vs 12,500 full)

- Implement progressive history summarization in assistant chat

- Update coding prompt to recommend new token-efficient tools

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@nioasoft nioasoft force-pushed the feat/token-optimization branch from 97c9739 to 32dfc85 Compare January 29, 2026 10:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
server/services/expand_chat_session.py (2)

42-46: Duplicate constant: EXPAND_FEATURE_TOOLS is unused.

Two identical constants are defined:

  • EXPAND_FEATURE_TOOLS (lines 42-46)
  • FEATURE_MCP_TOOLS (lines 64-69)

Only FEATURE_MCP_TOOLS is actually used (lines 180, 241). The EXPAND_FEATURE_TOOLS constant is dead code and should be removed.

🧹 Remove unused constant
-# Feature MCP tools needed for expand session
-EXPAND_FEATURE_TOOLS = [
-    "mcp__features__feature_create",
-    "mcp__features__feature_create_bulk",
-    "mcp__features__feature_get_stats",
-]
-
-

Also applies to: 64-69


219-229: Dead code: mcp_servers dict is defined but never used.

The mcp_servers dictionary defined at lines 219-229 is never referenced. The MCP configuration is now written to a file (lines 189-206) and the file path is passed to the SDK at line 243 (mcp_servers=str(mcp_config_file)).

This appears to be leftover code from before the refactor to file-based MCP configuration.

🧹 Remove dead code
         # Determine model from environment or use default
         # This allows using alternative APIs (e.g., GLM via z.ai) that may not support Claude model names
         model = os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", "claude-opus-4-5-20251101")
 
-        # Build MCP servers config for feature creation
-        mcp_servers = {
-            "features": {
-                "command": sys.executable,
-                "args": ["-m", "mcp_server.feature_mcp"],
-                "env": {
-                    "PROJECT_DIR": str(self.project_dir.resolve()),
-                    "PYTHONPATH": str(ROOT_DIR.resolve()),
-                },
-            },
-        }
-
         # Create Claude SDK client
server/websocket.py (1)

172-208: Cleanup scheduling can be skipped when updates are emitted.
process_line returns early on agent start/complete and most state-change updates, so the _should_cleanup() block is often bypassed. Under steady output, stale agents may never be purged. Consider scheduling cleanup before any early return.

🛠️ Suggested fix to always schedule cleanup
@@
-        # Check for orchestrator status messages first
+        # Periodic cleanup of stale agents (every 5 minutes)
+        if self._should_cleanup():
+            asyncio.create_task(self.cleanup_stale_agents())
+
+        # Check for orchestrator status messages first
@@
-        # Periodic cleanup of stale agents (every 5 minutes)
-        if self._should_cleanup():
-            # Schedule cleanup without blocking
-            asyncio.create_task(self.cleanup_stale_agents())
-
         return None
server/main.py (1)

29-100: Add SlowAPIMiddleware to enforce the global rate limit.

The limiter instance and exception handler are configured, but without SlowAPIMiddleware, the default_limits=["200/minute"] will not be applied. SlowAPI enforces default limits only through middleware; without it, routes remain unrated unless explicitly decorated with @limiter.limit().

Add the middleware
-from slowapi import Limiter, _rate_limit_exceeded_handler
+from slowapi import Limiter, _rate_limit_exceeded_handler
+from slowapi.middleware import SlowAPIMiddleware
 from slowapi.errors import RateLimitExceeded
@@
 app.state.limiter = limiter
 app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
+app.add_middleware(SlowAPIMiddleware)
mcp_server/feature_mcp.py (1)

1-34: Implement missing tools required by coding guidelines.

The Feature MCP server is missing three required tools from the coding guidelines: feature_get_next, feature_claim_next, and feature_get_for_regression. While feature_claim_and_get provides similar functionality to feature_claim_next, the other two tools have no alternatives in the current implementation.

🤖 Fix all issues with AI agents
In @.claude/templates/coding_prompt.template.md:
- Around line 310-320: The template incorrectly references the tool name
feature_get_summary (which doesn't exist) while the correct tool introduced in
this PR is spec_get_summary; update the TOKEN EFFICIENCY section to replace the
feature_get_summary mention with spec_get_summary (and verify nearby mentions of
feature_get_by_id remain unchanged), ensuring all mentions consistently use the
exact tool symbol spec_get_summary so coding agents aren't confused.

In `@api/feature_repository.py`:
- Around line 37-77: The retry logic in _commit_with_retry currently calls
session.rollback(), which discards pending changes so retries commit nothing
(breaking methods like mark_in_progress); fix by changing _commit_with_retry to
accept an apply_changes callable (e.g., def _commit_with_retry(session: Session,
apply_changes: Callable[[Session], None], max_retries: int =
MAX_COMMIT_RETRIES)) and invoke apply_changes(session) before each commit
attempt (including retries) so mutations are re-applied after rollback;
alternatively, if you prefer minimal change, call session.flush() before commit
to surface lock errors early and avoid discarding pending changes, but the
recommended approach is the apply_changes pattern so callers like
mark_in_progress re-apply their status mutation on each retry.

In `@mcp_server/feature_mcp.py`:
- Around line 507-542: feature_release_testing currently ignores the tested_ok
parameter and only clears Feature.in_progress; update it to record the test
outcome by calling feature_mark_passing(feature_id) when tested_ok is True and
feature_mark_failing(feature_id) when False (or directly set feature.passes
accordingly before commit), then clear feature.in_progress and commit; ensure
you import/locate and use the existing feature_mark_passing and
feature_mark_failing helpers (or set Feature.passes) and keep the same error
handling/rollback/session.close behavior.
🧹 Nitpick comments (17)
pyproject.toml (1)

25-27: Consider narrowing the DeprecationWarning filter.

Blanket ignore::DeprecationWarning suppresses all deprecation warnings, which could hide important notices from dependencies or Python itself. Consider scoping to specific modules or warning messages:

filterwarnings = [
    "ignore::DeprecationWarning:pytest_asyncio.*:",
    "ignore::pytest.PytestReturnNotNoneWarning",
]
requirements.txt (1)

1-15: Minor inconsistency in version specifier styles.

Most dependencies now use ~= (compatible release) but claude-agent-sdk (line 2) and apscheduler (line 11) still use >=,< bounds. This is functionally fine but inconsistent. Consider aligning if feasible:

-claude-agent-sdk>=0.1.0,<0.2.0
+claude-agent-sdk~=0.1.0
-apscheduler>=3.10.0,<4.0.0
+apscheduler~=3.10
api/config.py (1)

129-157: Consider thread safety for the singleton pattern.

The lazy singleton pattern works well for typical use cases where config is loaded once at startup. However, if get_config() is called concurrently from multiple threads during initial load, or if reload_config() is called during active request handling, there's a potential race condition.

For a more robust implementation, consider using a lock:

🔧 Optional: Thread-safe singleton
+import threading
+
 # Global config instance (lazy loaded)
 _config: Optional[AutocoderConfig] = None
+_config_lock = threading.Lock()
 
 
 def get_config() -> AutocoderConfig:
     global _config
-    if _config is None:
-        _config = AutocoderConfig()
+    if _config is None:
+        with _config_lock:
+            if _config is None:  # Double-check after acquiring lock
+                _config = AutocoderConfig()
     return _config
parallel_orchestrator.py (1)

137-139: Type annotation could be more precise.

The type dict[int, tuple[int, subprocess.Popen] | None] indicates values can be None (for placeholders), but the comment explains this well. Consider using a more explicit type alias for clarity.

💡 Optional: Type alias for clarity
+# Type for testing agent entries: (feature_id, process) or None for placeholder
+TestingAgentEntry = tuple[int, subprocess.Popen] | None

 # Testing agents: agent_id (pid) -> (feature_id, process)
 # Using pid as key allows multiple agents to test the same feature
-self.running_testing_agents: dict[int, tuple[int, subprocess.Popen] | None] = {}
+self.running_testing_agents: dict[int, TestingAgentEntry] = {}
server/routers/devserver.py (1)

51-79: Minor inconsistency in validation approach across routers.

This router uses validate_project_name (which raises HTTPException), while terminal.py and assistant_chat.py use is_valid_project_name (which returns a boolean). Both approaches work correctly, but the inconsistency may cause confusion when maintaining the codebase.

Since devserver.py is REST-only (no WebSocket endpoints with custom close codes), using the exception-raising validator is the cleaner choice here.

server/routers/projects.py (1)

422-427: Consider extracting the repeated sys.path manipulation.

The pattern of adding the root to sys.path (lines 423-425) is duplicated across multiple functions in this file (_init_imports, _get_registry_functions). Consider extracting this to a helper or using a single initialization point.

That said, this follows the existing pattern in the file, so it's acceptable for consistency.

api/feature_repository.py (1)

291-311: Missing priority-based sorting for ready features.

The get_ready_features method returns features without any ordering. Compare with api/dependency_resolver.py (lines 391-420), which sorts ready features by scheduling score, priority, and ID.

Without sorting, callers may receive features in non-deterministic order, which could affect which features get worked on first.

♻️ Proposed fix: Add priority-based sorting
     def get_ready_features(self) -> list[Feature]:
         """Get features that are ready to implement.
         ...
         """
         passing_ids = self.get_passing_ids()
         candidates = self.get_pending()

         ready = []
         for f in candidates:
             deps = f.dependencies or []
             if all(dep_id in passing_ids for dep_id in deps):
                 ready.append(f)

+        # Sort by priority (lower = higher priority), then by ID for stability
+        ready.sort(key=lambda f: (f.priority, f.id))
+
         return ready
tests/test_async_examples.py (1)

261-263: Consider making the expected feature count less brittle.

The assertion assert results[0] == 5 assumes populated_db always has exactly 5 features. If the fixture changes, this test will fail with a non-obvious error.

♻️ Proposed improvement
     # All should return the same count
     assert all(r == results[0] for r in results)
-    assert results[0] == 5  # populated_db has 5 features
+    # populated_db fixture creates 5 features; verify count is positive and consistent
+    assert results[0] > 0, "Expected populated_db to have features"

Or keep the explicit count but add the fixture's expected count as a constant that can be shared with the fixture definition.

api/migrations.py (2)

22-32: Add type annotation for engine parameter.

All migration functions lack type hints for the engine parameter. Consider adding type annotations for better IDE support and static analysis.

✨ Suggested type annotation
+from sqlalchemy.engine import Engine
+
-def migrate_add_in_progress_column(engine) -> None:
+def migrate_add_in_progress_column(engine: Engine) -> None:

Apply similarly to all other migration functions.


176-186: SQL construction is safe here but consider parameterization for future-proofing.

The f-string SQL construction on Line 185 is currently safe since col_name and col_type come from a hardcoded list (Lines 176-181). However, if this pattern is extended in the future with dynamic values, it could become a risk.

tests/conftest.py (3)

72-87: Redundant create_database call in db_session fixture.

The temp_db fixture already calls create_database(project_dir) on Line 65. Calling it again on Line 80 is redundant and could mask issues if the two calls return different session factories.

♻️ Suggested fix: reuse the database from temp_db
 `@pytest.fixture`
-def db_session(temp_db: Path):
+def db_session(temp_db: Path):
     """Get a database session for testing.

     Provides a session that is automatically rolled back after each test.
     """
     from api.database import create_database

-    _, SessionLocal = create_database(temp_db)
+    # create_database is idempotent and returns existing engine/session
+    _, SessionLocal = create_database(temp_db)
     session = SessionLocal()

Alternatively, consider having temp_db yield both the path and session factory to avoid the second call.


95-110: Async fixture performs only synchronous operations.

The async_temp_db fixture is declared as async but contains no await expressions. The comment on Line 107 acknowledges this. Consider making it a regular sync fixture unless there's a specific reason for async (e.g., compatibility with async test collection).


216-244: populated_db fixture should use db_session fixture to avoid duplication.

This fixture duplicates the session management pattern from db_session. Consider depending on db_session instead to reduce code duplication and ensure consistent cleanup.

♻️ Suggested refactor
 `@pytest.fixture`
-def populated_db(temp_db: Path, sample_feature_data: dict) -> Path:
+def populated_db(temp_db: Path, db_session, sample_feature_data: dict) -> Path:
     """Create a database populated with sample features.

     Returns the project directory path.
     """
     from api.database import Feature, create_database

-    _, SessionLocal = create_database(temp_db)
-    session = SessionLocal()
-
-    try:
-        # Add sample features
-        for i in range(5):
-            feature = Feature(
-                priority=i + 1,
-                category=f"category_{i % 2}",
-                name=f"Feature {i + 1}",
-                description=f"Description for feature {i + 1}",
-                steps=[f"Step {j}" for j in range(3)],
-                passes=i < 2,  # First 2 features are passing
-                in_progress=i == 2,  # Third feature is in progress
-            )
-            session.add(feature)
-
-        session.commit()
-    finally:
-        session.close()
+    # Add sample features
+    for i in range(5):
+        feature = Feature(
+            priority=i + 1,
+            category=f"category_{i % 2}",
+            name=f"Feature {i + 1}",
+            description=f"Description for feature {i + 1}",
+            steps=[f"Step {j}" for j in range(3)],
+            passes=i < 2,
+            in_progress=i == 2,
+        )
+        db_session.add(feature)
+
+    db_session.commit()

     return temp_db
mcp_server/feature_mcp.py (3)

46-50: Duplicate _utc_now() function - import from api.models instead.

This helper is also defined in api/models.py (Lines 27-30). Duplicating it creates maintenance burden and potential for divergence.

♻️ Suggested fix
-def _utc_now() -> datetime:
-    """Return current UTC time."""
-    return datetime.now(timezone.utc)
-
 from mcp.server.fastmcp import FastMCP
 from pydantic import BaseModel, Field
+from api.models import _utc_now

Note: You may need to rename the function in api/models.py to utc_now (without underscore) if it should be part of the public API.


1189-1201: Multiple COUNT queries could be consolidated.

The feature_get_attempts function makes 4 separate queries for statistics. Consider combining into a single aggregate query similar to feature_get_stats (Lines 162-166).

♻️ Suggested optimization
+        from sqlalchemy import case, func
+
         # Calculate statistics
-        total_attempts = session.query(FeatureAttempt).filter(
-            FeatureAttempt.feature_id == feature_id
-        ).count()
-
-        success_count = session.query(FeatureAttempt).filter(
-            FeatureAttempt.feature_id == feature_id,
-            FeatureAttempt.outcome == "success"
-        ).count()
-
-        failure_count = session.query(FeatureAttempt).filter(
-            FeatureAttempt.feature_id == feature_id,
-            FeatureAttempt.outcome == "failure"
-        ).count()
+        stats = session.query(
+            func.count(FeatureAttempt.id).label('total'),
+            func.sum(case((FeatureAttempt.outcome == 'success', 1), else_=0)).label('success'),
+            func.sum(case((FeatureAttempt.outcome == 'failure', 1), else_=0)).label('failure')
+        ).filter(FeatureAttempt.feature_id == feature_id).first()
+
+        total_attempts = stats.total or 0
+        success_count = int(stats.success or 0)
+        failure_count = int(stats.failure or 0)

1407-1497: spec_get_summary provides good token optimization but regex parsing may be fragile.

The function effectively reduces token consumption (~94% per PR objectives). However, the regex-based parsing assumes specific XML-like tag formats. Consider adding a note about expected format or providing fallback for malformed specs.

Minor: The function imports re inside the function body (Line 1424). Moving it to module level would be more conventional.

api/models.py (1)

21-21: Use updated import path for declarative_base.

sqlalchemy.ext.declarative.declarative_base is deprecated since SQLAlchemy 1.4. Use the new import path for forward compatibility.

♻️ Suggested fix
-from sqlalchemy.ext.declarative import declarative_base
+from sqlalchemy.orm import declarative_base

Comment on lines 310 to 411
## TOKEN EFFICIENCY

To maximize context window usage:

- **Don't read files unnecessarily** - Feature details from `feature_get_by_id` contain everything you need
- **Be concise** - Short, focused responses save tokens for actual work
- **Use `feature_get_summary`** for status checks (lighter than `feature_get_by_id`)
- **Use `spec_get_summary`** for project context (~800 tokens vs 12,500 for full app_spec.txt)
- **Avoid re-reading large files** - Read once, remember the content

---
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent tool name reference.

Line 316 references feature_get_summary but the actual tool introduced in this PR is spec_get_summary (as correctly shown on line 317 and line 276). This may confuse coding agents.

📝 Proposed fix
 - **Don't read files unnecessarily** - Feature details from `feature_get_by_id` contain everything you need
 - **Be concise** - Short, focused responses save tokens for actual work
-- **Use `feature_get_summary`** for status checks (lighter than `feature_get_by_id`)
+- **Use `feature_get_stats`** for status checks (lighter than `feature_get_by_id`)
 - **Use `spec_get_summary`** for project context (~800 tokens vs 12,500 for full app_spec.txt)
🤖 Prompt for AI Agents
In @.claude/templates/coding_prompt.template.md around lines 310 - 320, The
template incorrectly references the tool name feature_get_summary (which doesn't
exist) while the correct tool introduced in this PR is spec_get_summary; update
the TOKEN EFFICIENCY section to replace the feature_get_summary mention with
spec_get_summary (and verify nearby mentions of feature_get_by_id remain
unchanged), ensuring all mentions consistently use the exact tool symbol
spec_get_summary so coding agents aren't confused.

Comment on lines 37 to 77
def _commit_with_retry(session: Session, max_retries: int = MAX_COMMIT_RETRIES) -> None:
"""
Commit a session with retry logic for transient errors.

Handles SQLITE_BUSY, SQLITE_LOCKED, and similar transient errors
with exponential backoff.

Args:
session: SQLAlchemy session to commit
max_retries: Maximum number of retry attempts

Raises:
OperationalError: If commit fails after all retries
"""
delay_ms = INITIAL_RETRY_DELAY_MS
last_error = None

for attempt in range(max_retries + 1):
try:
session.commit()
return
except OperationalError as e:
error_msg = str(e).lower()
# Retry on lock/busy errors
if "locked" in error_msg or "busy" in error_msg:
last_error = e
if attempt < max_retries:
logger.warning(
f"Database commit failed (attempt {attempt + 1}/{max_retries + 1}), "
f"retrying in {delay_ms}ms: {e}"
)
time.sleep(delay_ms / 1000)
delay_ms *= 2 # Exponential backoff
session.rollback() # Reset session state before retry
continue
raise

# If we get here, all retries failed
if last_error:
logger.error(f"Database commit failed after {max_retries + 1} attempts")
raise last_error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retry logic may not preserve pending changes after rollback.

After session.rollback() on line 70, any pending changes to objects in the session are discarded. When the loop retries session.commit(), there's nothing to commit because the changes were rolled back.

For the status update methods (e.g., mark_in_progress), this means a retry after rollback will commit an empty transaction, silently failing to persist the intended change.

🐛 Proposed fix: Re-apply changes before retry

One approach is to have the caller handle retries by passing a callable that applies the changes:

-def _commit_with_retry(session: Session, max_retries: int = MAX_COMMIT_RETRIES) -> None:
+def _commit_with_retry(
+    session: Session,
+    apply_changes: callable = None,
+    max_retries: int = MAX_COMMIT_RETRIES
+) -> None:
     """
     Commit a session with retry logic for transient errors.
-
-    Handles SQLITE_BUSY, SQLITE_LOCKED, and similar transient errors
-    with exponential backoff.
+    
+    Args:
+        session: SQLAlchemy session to commit
+        apply_changes: Optional callable to re-apply changes after rollback
+        max_retries: Maximum number of retry attempts
     ...
     """
     delay_ms = INITIAL_RETRY_DELAY_MS
     last_error = None

     for attempt in range(max_retries + 1):
         try:
+            if attempt > 0 and apply_changes:
+                apply_changes()
             session.commit()
             return
         except OperationalError as e:
             error_msg = str(e).lower()
             if "locked" in error_msg or "busy" in error_msg:
                 last_error = e
                 if attempt < max_retries:
                     ...
                     session.rollback()
                     continue
             raise

Alternatively, consider using session.flush() to detect lock errors before commit, or restructure to re-query and re-apply the mutation on each retry.

🤖 Prompt for AI Agents
In `@api/feature_repository.py` around lines 37 - 77, The retry logic in
_commit_with_retry currently calls session.rollback(), which discards pending
changes so retries commit nothing (breaking methods like mark_in_progress); fix
by changing _commit_with_retry to accept an apply_changes callable (e.g., def
_commit_with_retry(session: Session, apply_changes: Callable[[Session], None],
max_retries: int = MAX_COMMIT_RETRIES)) and invoke apply_changes(session) before
each commit attempt (including retries) so mutations are re-applied after
rollback; alternatively, if you prefer minimal change, call session.flush()
before commit to surface lock errors early and avoid discarding pending changes,
but the recommended approach is the apply_changes pattern so callers like
mark_in_progress re-apply their status mutation on each retry.

nioasoft and others added 2 commits January 29, 2026 13:07
- Use feature_get_stats for progress checks
- Use feature_get_summary for single feature status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The testing architecture was simplified - testing agents now work

concurrently without claim/release coordination.

- Remove feature_release_testing from allowed tools in client.py

- Update testing_prompt.template.md to remove claim/release workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.claude/templates/testing_prompt.template.md:
- Around line 86-91: The prompt's "end your session" is ambiguous; update the
template around the echo step (the echo "[Testing] Feature #{id} verified -
still passing" >> claude-progress.txt line and the surrounding text) to include
an explicit termination action the agent should perform, e.g. call the
orchestrator completion API/tool, run an exit command (explicitly state "exit 0"
or equivalent), or note that the session will auto-terminate after logging —
reference the echo line and the phrase "end your session" so implementers
replace the vague wording with one concrete mechanism (tool call name or exact
exit instruction).
🧹 Nitpick comments (1)
.claude/templates/testing_prompt.template.md (1)

186-186: Consider clarifying "one iteration" language.

The phrase "You have one iteration" (line 186) is somewhat ambiguous. Combined with line 171 ("Test ONE feature thoroughly"), the intent appears to be limiting scope to a single feature per session. However, "iteration" could be misinterpreted as:

  • Only one attempt to fix a regression (even if the first fix fails)
  • No retries if browser automation encounters transient failures
  • One verification pass only

Consider more explicit phrasing such as:

  • "Your session is scoped to ONE feature. Complete all verification and any necessary fixes for that feature."
  • "Focus on testing ONE feature thoroughly. You may iterate on fixes for that feature until it passes."

- Add explicit note that session auto-terminates after logging

- Clarify that 'one iteration' means one feature per session, not one fix attempt

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
rudiheydra added a commit to rudiheydra/AutoBuildr that referenced this pull request Feb 2, 2026
…orgeAI#128)

Added TestHarnessKernelBudgetEnforcement class with 8 tests proving all
7 verification steps for HarnessKernel.execute() budget enforcement:

1. execute() called with compiled AgentSpec via --spec path
2. turns_used incremented after each turn, matches actual count
3. max_turns=2 stops execution after exactly 2 turns
4. Budget exhaustion sets status='timeout' (not 'failed')
5. 'timeout' event recorded in agent_events with payload
6. Acceptance validators run after budget exhaustion (graceful termination)
7. tokens_in and tokens_out tracked and stored on AgentRun
8. Token tracking persists even on timeout

All 54 tests pass with no regressions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant