feat(mcp): optimize token consumption in MCP responses by nioasoft · Pull Request #128 · AutoForgeAI/autoforge

nioasoft · 2026-01-29T10:28:07Z

Summary

Reduces token consumption across three main areas:

MCP Response Optimization: Add minimal serialization methods and use them in high-volume tools
App Spec Summary Tool: New spec_get_summary tool returns condensed project spec (~800 tokens vs ~12,500)
Assistant Chat History: Progressive summarization reduces token usage by ~50%

Changes

Phase 1: MCP Response Optimization (Highest Impact)

api/models.py:

Added to_minimal_dict() - Returns only essential fields (~80% smaller than to_dict())
Added to_cycle_check_dict() - Returns only id and dependencies (~95% smaller)

mcp_server/feature_mcp.py:

feature_add_dependency() - Uses to_cycle_check_dict() for cycle checking
feature_set_dependencies() - Uses to_cycle_check_dict() for cycle checking
feature_get_ready() - Added minimal=True parameter (default)
feature_get_blocked() - Added minimal=True parameter (default)
feature_get_graph() - Queries only needed columns

Phase 2: App Spec Summary Tool

Added spec_get_summary() MCP tool that returns:
- project_name
- overview (truncated to 200 chars)
- technology_stack
- ports
- feature_count

Phase 3: Assistant Chat History Optimization

server/services/assistant_chat_session.py:

Progressive history summarization:
- Recent messages (last 5): up to 1500 chars
- Older messages (6-20): 100-char summaries
- Messages 21+: omitted

Documentation

Updated coding prompt to recommend new token-efficient tools

Expected Token Savings

Source	Savings
Cycle check operations	~95%
feature_get_ready/blocked	~80%
feature_get_graph	~80%
App spec reading	~94%
Assistant chat history	~50%

Test plan

All 58 tests pass
Ruff linting passes on modified files
New minimal serialization methods work correctly

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a project-spec summary tool for concise project context.
- Added an option to return minimal feature data to reduce payload size.
Behavior Changes
- Conversation history now uses progressive summarization with a shorter recent window.
- Testing workflow simplified: release-testing removed; sessions now log results and focus on commit-driven iteration.
Performance & Optimization
- Token-efficient feature queries, optimized graph loading, and improved dependency checks.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-29T10:28:25Z

📝 Walkthrough

Walkthrough

Adds token-efficient feature serialization and cycle-check helpers, introduces a new public tool spec_get_summary to condense app_spec.txt, makes MCP feature list APIs optionally minimal, reduces chat history window with progressive summarization, and simplifies testing prompts and client tools by removing explicit release-testing actions.

Changes

Cohort / File(s)	Summary
Prompt Template & Guidance `.claude/templates/coding_prompt.template.md`	Remove instruction to read `app_spec.txt`; add `spec_get_summary` step; add a dedicated TOKEN EFFICIENCY block and update orientation flow to recommend `spec_get_summary` for project context.
Testing Prompt & Client Tools `.claude/templates/testing_prompt.template.md`, `client.py`	Rework test orchestration to logging-and-commit (remove explicit release flow); remove `mcp__features__feature_release_testing` from `FEATURE_MCP_TOOLS`.
Database Model Serialization `api/database.py`	Add `Feature.to_minimal_dict()` for compact serialization and `Feature.to_cycle_check_dict()` for cycle-detection payloads.
MCP Server Tools & APIs `mcp_server/feature_mcp.py`	Add public `spec_get_summary()` tool that reads `app_spec.txt` and returns condensed JSON; make `feature_get_ready` and `feature_get_blocked` optionally return minimal records (`minimal: bool = True`); optimize `feature_get_graph` query to fetch essential columns; use `to_cycle_check_dict()` for cycle checks in dependency-setting endpoints.
Chat History Management `server/services/assistant_chat_session.py`	Reduce history window from 35→20 messages; implement progressive summarization (last 5 up to 1500 chars, messages 6–20 summarized/truncated to 100 chars); update logging and resume-loading flow.
Templates / Manifests `.claude/templates/*`, manifest analyzer files	Small additions and relocations of token-efficiency guidance and minor manifest analyzer tweaks (lines changed noted in summaries).

Sequence Diagram(s)

sequenceDiagram
    participant Assistant as Assistant
    participant MCP_Server as MCP Server
    participant FileSystem as File System
    participant Database as Database

    Assistant->>MCP_Server: call spec_get_summary()
    MCP_Server->>FileSystem: read `app_spec.txt`
    FileSystem-->>MCP_Server: raw spec text
    MCP_Server->>MCP_Server: parse & condense (project_name, overview, stack, ports, feature_count)
    MCP_Server-->>Assistant: return JSON summary

    Assistant->>MCP_Server: call feature_get_ready(minimal=true)
    MCP_Server->>Database: query essential feature columns
    Database-->>MCP_Server: feature rows
    MCP_Server->>MCP_Server: serialize with to_minimal_dict()
    MCP_Server-->>Assistant: minimal feature list

    Note over MCP_Server,Database: Dependency updates use to_cycle_check_dict() for cycle checks

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: expand session MCP wiring + scheduling infinite loop #124 — related changes around dependency-cycle handling and cycle-detection logic (uses visited-set; ties to new to_cycle_check_dict() usage).

Poem

🐰 I nibble big specs into neat bites,

I hop through deps and tame long nights,
Tokens trimmed, summaries in hand,
Leaner trails across the land,
A rabbit’s cheer for tidy plans!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(mcp): optimize token consumption in MCP responses' accurately and concisely summarizes the primary change—reducing token usage across MCP feature responses through serialization optimization, new tools, and history summarization.
Docstring Coverage	✅ Passed	Docstring coverage is 98.18% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Add to_minimal_dict() and to_cycle_check_dict() to Feature model - Use minimal serialization for cycle detection (~95% token reduction) - Add minimal parameter to feature_get_ready/blocked (default True) - Optimize feature_get_graph to query only needed columns - Add spec_get_summary MCP tool (~800 tokens vs 12,500 full) - Implement progressive history summarization in assistant chat - Update coding prompt to recommend new token-efficient tools Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

server/services/expand_chat_session.py (2)
42-46: Duplicate constant: EXPAND_FEATURE_TOOLS is unused.

Two identical constants are defined:

EXPAND_FEATURE_TOOLS (lines 42-46)

FEATURE_MCP_TOOLS (lines 64-69)

Only FEATURE_MCP_TOOLS is actually used (lines 180, 241). The EXPAND_FEATURE_TOOLS constant is dead code and should be removed.
🧹 Remove unused constant
-# Feature MCP tools needed for expand session
-EXPAND_FEATURE_TOOLS = [
-    "mcp__features__feature_create",
-    "mcp__features__feature_create_bulk",
-    "mcp__features__feature_get_stats",
-]
-
-
Also applies to: 64-69

219-229: Dead code: mcp_servers dict is defined but never used.

The mcp_servers dictionary defined at lines 219-229 is never referenced. The MCP configuration is now written to a file (lines 189-206) and the file path is passed to the SDK at line 243 (mcp_servers=str(mcp_config_file)).

This appears to be leftover code from before the refactor to file-based MCP configuration.
🧹 Remove dead code
         # Determine model from environment or use default
         # This allows using alternative APIs (e.g., GLM via z.ai) that may not support Claude model names
         model = os.getenv("ANTHROPIC_DEFAULT_OPUS_MODEL", "claude-opus-4-5-20251101")
 
-        # Build MCP servers config for feature creation
-        mcp_servers = {
-            "features": {
-                "command": sys.executable,
-                "args": ["-m", "mcp_server.feature_mcp"],
-                "env": {
-                    "PROJECT_DIR": str(self.project_dir.resolve()),
-                    "PYTHONPATH": str(ROOT_DIR.resolve()),
-                },
-            },
-        }
-
         # Create Claude SDK client
server/websocket.py (1)
172-208: Cleanup scheduling can be skipped when updates are emitted.
process_line returns early on agent start/complete and most state-change updates, so the _should_cleanup() block is often bypassed. Under steady output, stale agents may never be purged. Consider scheduling cleanup before any early return.
🛠️ Suggested fix to always schedule cleanup
@@
-        # Check for orchestrator status messages first
+        # Periodic cleanup of stale agents (every 5 minutes)
+        if self._should_cleanup():
+            asyncio.create_task(self.cleanup_stale_agents())
+
+        # Check for orchestrator status messages first
@@
-        # Periodic cleanup of stale agents (every 5 minutes)
-        if self._should_cleanup():
-            # Schedule cleanup without blocking
-            asyncio.create_task(self.cleanup_stale_agents())
-
         return None
server/main.py (1)
29-100: Add SlowAPIMiddleware to enforce the global rate limit.

The limiter instance and exception handler are configured, but without SlowAPIMiddleware, the default_limits=["200/minute"] will not be applied. SlowAPI enforces default limits only through middleware; without it, routes remain unrated unless explicitly decorated with @limiter.limit().
Add the middleware
-from slowapi import Limiter, _rate_limit_exceeded_handler
+from slowapi import Limiter, _rate_limit_exceeded_handler
+from slowapi.middleware import SlowAPIMiddleware
 from slowapi.errors import RateLimitExceeded
@@
 app.state.limiter = limiter
 app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
+app.add_middleware(SlowAPIMiddleware)
mcp_server/feature_mcp.py (1)

1-34: Implement missing tools required by coding guidelines.

The Feature MCP server is missing three required tools from the coding guidelines: feature_get_next, feature_claim_next, and feature_get_for_regression. While feature_claim_and_get provides similar functionality to feature_claim_next, the other two tools have no alternatives in the current implementation.

🤖 Fix all issues with AI agents

In @.claude/templates/coding_prompt.template.md:
- Around line 310-320: The template incorrectly references the tool name
feature_get_summary (which doesn't exist) while the correct tool introduced in
this PR is spec_get_summary; update the TOKEN EFFICIENCY section to replace the
feature_get_summary mention with spec_get_summary (and verify nearby mentions of
feature_get_by_id remain unchanged), ensuring all mentions consistently use the
exact tool symbol spec_get_summary so coding agents aren't confused.

In `@api/feature_repository.py`:
- Around line 37-77: The retry logic in _commit_with_retry currently calls
session.rollback(), which discards pending changes so retries commit nothing
(breaking methods like mark_in_progress); fix by changing _commit_with_retry to
accept an apply_changes callable (e.g., def _commit_with_retry(session: Session,
apply_changes: Callable[[Session], None], max_retries: int =
MAX_COMMIT_RETRIES)) and invoke apply_changes(session) before each commit
attempt (including retries) so mutations are re-applied after rollback;
alternatively, if you prefer minimal change, call session.flush() before commit
to surface lock errors early and avoid discarding pending changes, but the
recommended approach is the apply_changes pattern so callers like
mark_in_progress re-apply their status mutation on each retry.

In `@mcp_server/feature_mcp.py`:
- Around line 507-542: feature_release_testing currently ignores the tested_ok
parameter and only clears Feature.in_progress; update it to record the test
outcome by calling feature_mark_passing(feature_id) when tested_ok is True and
feature_mark_failing(feature_id) when False (or directly set feature.passes
accordingly before commit), then clear feature.in_progress and commit; ensure
you import/locate and use the existing feature_mark_passing and
feature_mark_failing helpers (or set Feature.passes) and keep the same error
handling/rollback/session.close behavior.

🧹 Nitpick comments (17)

pyproject.toml (1)
25-27: Consider narrowing the DeprecationWarning filter.

Blanket ignore::DeprecationWarning suppresses all deprecation warnings, which could hide important notices from dependencies or Python itself. Consider scoping to specific modules or warning messages:
filterwarnings = [
    "ignore::DeprecationWarning:pytest_asyncio.*:",
    "ignore::pytest.PytestReturnNotNoneWarning",
]
requirements.txt (1)
1-15: Minor inconsistency in version specifier styles.

Most dependencies now use ~= (compatible release) but claude-agent-sdk (line 2) and apscheduler (line 11) still use >=,< bounds. This is functionally fine but inconsistent. Consider aligning if feasible:
-claude-agent-sdk>=0.1.0,<0.2.0
+claude-agent-sdk~=0.1.0
-apscheduler>=3.10.0,<4.0.0
+apscheduler~=3.10
api/config.py (1)
129-157: Consider thread safety for the singleton pattern.

The lazy singleton pattern works well for typical use cases where config is loaded once at startup. However, if get_config() is called concurrently from multiple threads during initial load, or if reload_config() is called during active request handling, there's a potential race condition.

For a more robust implementation, consider using a lock:
🔧 Optional: Thread-safe singleton
+import threading
+
 # Global config instance (lazy loaded)
 _config: Optional[AutocoderConfig] = None
+_config_lock = threading.Lock()
 
 
 def get_config() -> AutocoderConfig:
     global _config
-    if _config is None:
-        _config = AutocoderConfig()
+    if _config is None:
+        with _config_lock:
+            if _config is None:  # Double-check after acquiring lock
+                _config = AutocoderConfig()
     return _config
parallel_orchestrator.py (1)
137-139: Type annotation could be more precise.

The type dict[int, tuple[int, subprocess.Popen] | None] indicates values can be None (for placeholders), but the comment explains this well. Consider using a more explicit type alias for clarity.
💡 Optional: Type alias for clarity
+# Type for testing agent entries: (feature_id, process) or None for placeholder
+TestingAgentEntry = tuple[int, subprocess.Popen] | None

 # Testing agents: agent_id (pid) -> (feature_id, process)
 # Using pid as key allows multiple agents to test the same feature
-self.running_testing_agents: dict[int, tuple[int, subprocess.Popen] | None] = {}
+self.running_testing_agents: dict[int, TestingAgentEntry] = {}
server/routers/devserver.py (1)

51-79: Minor inconsistency in validation approach across routers.

This router uses validate_project_name (which raises HTTPException), while terminal.py and assistant_chat.py use is_valid_project_name (which returns a boolean). Both approaches work correctly, but the inconsistency may cause confusion when maintaining the codebase.

Since devserver.py is REST-only (no WebSocket endpoints with custom close codes), using the exception-raising validator is the cleaner choice here.

server/routers/projects.py (1)

422-427: Consider extracting the repeated sys.path manipulation.

The pattern of adding the root to sys.path (lines 423-425) is duplicated across multiple functions in this file (_init_imports, _get_registry_functions). Consider extracting this to a helper or using a single initialization point.

That said, this follows the existing pattern in the file, so it's acceptable for consistency.
api/feature_repository.py (1)
291-311: Missing priority-based sorting for ready features.

The get_ready_features method returns features without any ordering. Compare with api/dependency_resolver.py (lines 391-420), which sorts ready features by scheduling score, priority, and ID.

Without sorting, callers may receive features in non-deterministic order, which could affect which features get worked on first.
♻️ Proposed fix: Add priority-based sorting
     def get_ready_features(self) -> list[Feature]:
         """Get features that are ready to implement.
         ...
         """
         passing_ids = self.get_passing_ids()
         candidates = self.get_pending()

         ready = []
         for f in candidates:
             deps = f.dependencies or []
             if all(dep_id in passing_ids for dep_id in deps):
                 ready.append(f)

+        # Sort by priority (lower = higher priority), then by ID for stability
+        ready.sort(key=lambda f: (f.priority, f.id))
+
         return ready
tests/test_async_examples.py (1)
261-263: Consider making the expected feature count less brittle.

The assertion assert results[0] == 5 assumes populated_db always has exactly 5 features. If the fixture changes, this test will fail with a non-obvious error.
♻️ Proposed improvement
     # All should return the same count
     assert all(r == results[0] for r in results)
-    assert results[0] == 5  # populated_db has 5 features
+    # populated_db fixture creates 5 features; verify count is positive and consistent
+    assert results[0] > 0, "Expected populated_db to have features"
Or keep the explicit count but add the fixture's expected count as a constant that can be shared with the fixture definition.
api/migrations.py (2)
22-32: Add type annotation for engine parameter.

All migration functions lack type hints for the engine parameter. Consider adding type annotations for better IDE support and static analysis.
✨ Suggested type annotation
+from sqlalchemy.engine import Engine
+
-def migrate_add_in_progress_column(engine) -> None:
+def migrate_add_in_progress_column(engine: Engine) -> None:
Apply similarly to all other migration functions.
176-186: SQL construction is safe here but consider parameterization for future-proofing.

The f-string SQL construction on Line 185 is currently safe since col_name and col_type come from a hardcoded list (Lines 176-181). However, if this pattern is extended in the future with dynamic values, it could become a risk.
tests/conftest.py (3)
72-87: Redundant create_database call in db_session fixture.

The temp_db fixture already calls create_database(project_dir) on Line 65. Calling it again on Line 80 is redundant and could mask issues if the two calls return different session factories.
♻️ Suggested fix: reuse the database from temp_db
 `@pytest.fixture`
-def db_session(temp_db: Path):
+def db_session(temp_db: Path):
     """Get a database session for testing.

     Provides a session that is automatically rolled back after each test.
     """
     from api.database import create_database

-    _, SessionLocal = create_database(temp_db)
+    # create_database is idempotent and returns existing engine/session
+    _, SessionLocal = create_database(temp_db)
     session = SessionLocal()
Alternatively, consider having temp_db yield both the path and session factory to avoid the second call.
95-110: Async fixture performs only synchronous operations.

The async_temp_db fixture is declared as async but contains no await expressions. The comment on Line 107 acknowledges this. Consider making it a regular sync fixture unless there's a specific reason for async (e.g., compatibility with async test collection).

216-244: populated_db fixture should use db_session fixture to avoid duplication.

This fixture duplicates the session management pattern from db_session. Consider depending on db_session instead to reduce code duplication and ensure consistent cleanup.
♻️ Suggested refactor
 `@pytest.fixture`
-def populated_db(temp_db: Path, sample_feature_data: dict) -> Path:
+def populated_db(temp_db: Path, db_session, sample_feature_data: dict) -> Path:
     """Create a database populated with sample features.

     Returns the project directory path.
     """
     from api.database import Feature, create_database

-    _, SessionLocal = create_database(temp_db)
-    session = SessionLocal()
-
-    try:
-        # Add sample features
-        for i in range(5):
-            feature = Feature(
-                priority=i + 1,
-                category=f"category_{i % 2}",
-                name=f"Feature {i + 1}",
-                description=f"Description for feature {i + 1}",
-                steps=[f"Step {j}" for j in range(3)],
-                passes=i < 2,  # First 2 features are passing
-                in_progress=i == 2,  # Third feature is in progress
-            )
-            session.add(feature)
-
-        session.commit()
-    finally:
-        session.close()
+    # Add sample features
+    for i in range(5):
+        feature = Feature(
+            priority=i + 1,
+            category=f"category_{i % 2}",
+            name=f"Feature {i + 1}",
+            description=f"Description for feature {i + 1}",
+            steps=[f"Step {j}" for j in range(3)],
+            passes=i < 2,
+            in_progress=i == 2,
+        )
+        db_session.add(feature)
+
+    db_session.commit()

     return temp_db
mcp_server/feature_mcp.py (3)
46-50: Duplicate _utc_now() function - import from api.models instead.

This helper is also defined in api/models.py (Lines 27-30). Duplicating it creates maintenance burden and potential for divergence.
♻️ Suggested fix
-def _utc_now() -> datetime:
-    """Return current UTC time."""
-    return datetime.now(timezone.utc)
-
 from mcp.server.fastmcp import FastMCP
 from pydantic import BaseModel, Field
+from api.models import _utc_now
Note: You may need to rename the function in api/models.py to utc_now (without underscore) if it should be part of the public API.
1189-1201: Multiple COUNT queries could be consolidated.

The feature_get_attempts function makes 4 separate queries for statistics. Consider combining into a single aggregate query similar to feature_get_stats (Lines 162-166).
♻️ Suggested optimization
+        from sqlalchemy import case, func
+
         # Calculate statistics
-        total_attempts = session.query(FeatureAttempt).filter(
-            FeatureAttempt.feature_id == feature_id
-        ).count()
-
-        success_count = session.query(FeatureAttempt).filter(
-            FeatureAttempt.feature_id == feature_id,
-            FeatureAttempt.outcome == "success"
-        ).count()
-
-        failure_count = session.query(FeatureAttempt).filter(
-            FeatureAttempt.feature_id == feature_id,
-            FeatureAttempt.outcome == "failure"
-        ).count()
+        stats = session.query(
+            func.count(FeatureAttempt.id).label('total'),
+            func.sum(case((FeatureAttempt.outcome == 'success', 1), else_=0)).label('success'),
+            func.sum(case((FeatureAttempt.outcome == 'failure', 1), else_=0)).label('failure')
+        ).filter(FeatureAttempt.feature_id == feature_id).first()
+
+        total_attempts = stats.total or 0
+        success_count = int(stats.success or 0)
+        failure_count = int(stats.failure or 0)
1407-1497: spec_get_summary provides good token optimization but regex parsing may be fragile.

The function effectively reduces token consumption (~94% per PR objectives). However, the regex-based parsing assumes specific XML-like tag formats. Consider adding a note about expected format or providing fallback for malformed specs.

Minor: The function imports re inside the function body (Line 1424). Moving it to module level would be more conventional.
api/models.py (1)
21-21: Use updated import path for declarative_base.

sqlalchemy.ext.declarative.declarative_base is deprecated since SQLAlchemy 1.4. Use the new import path for forward compatibility.
♻️ Suggested fix
-from sqlalchemy.ext.declarative import declarative_base
+from sqlalchemy.orm import declarative_base

coderabbitai · 2026-01-29T10:38:29Z

.claude/templates/coding_prompt.template.md

+## TOKEN EFFICIENCY
+
+To maximize context window usage:
+
+- **Don't read files unnecessarily** - Feature details from `feature_get_by_id` contain everything you need
+- **Be concise** - Short, focused responses save tokens for actual work
+- **Use `feature_get_summary`** for status checks (lighter than `feature_get_by_id`)
+- **Use `spec_get_summary`** for project context (~800 tokens vs 12,500 for full app_spec.txt)
+- **Avoid re-reading large files** - Read once, remember the content
+
+---


⚠️ Potential issue | 🟡 Minor

Inconsistent tool name reference.

Line 316 references feature_get_summary but the actual tool introduced in this PR is spec_get_summary (as correctly shown on line 317 and line 276). This may confuse coding agents.

📝 Proposed fix

- **Don't read files unnecessarily** - Feature details from `feature_get_by_id` contain everything you need - **Be concise** - Short, focused responses save tokens for actual work -- **Use `feature_get_summary`** for status checks (lighter than `feature_get_by_id`) +- **Use `feature_get_stats`** for status checks (lighter than `feature_get_by_id`) - **Use `spec_get_summary`** for project context (~800 tokens vs 12,500 for full app_spec.txt)

🤖 Prompt for AI Agents

In @.claude/templates/coding_prompt.template.md around lines 310 - 320, The template incorrectly references the tool name feature_get_summary (which doesn't exist) while the correct tool introduced in this PR is spec_get_summary; update the TOKEN EFFICIENCY section to replace the feature_get_summary mention with spec_get_summary (and verify nearby mentions of feature_get_by_id remain unchanged), ensuring all mentions consistently use the exact tool symbol spec_get_summary so coding agents aren't confused.

coderabbitai · 2026-01-29T10:38:29Z

api/feature_repository.py

+def _commit_with_retry(session: Session, max_retries: int = MAX_COMMIT_RETRIES) -> None:
+    """
+    Commit a session with retry logic for transient errors.
+
+    Handles SQLITE_BUSY, SQLITE_LOCKED, and similar transient errors
+    with exponential backoff.
+
+    Args:
+        session: SQLAlchemy session to commit
+        max_retries: Maximum number of retry attempts
+
+    Raises:
+        OperationalError: If commit fails after all retries
+    """
+    delay_ms = INITIAL_RETRY_DELAY_MS
+    last_error = None
+
+    for attempt in range(max_retries + 1):
+        try:
+            session.commit()
+            return
+        except OperationalError as e:
+            error_msg = str(e).lower()
+            # Retry on lock/busy errors
+            if "locked" in error_msg or "busy" in error_msg:
+                last_error = e
+                if attempt < max_retries:
+                    logger.warning(
+                        f"Database commit failed (attempt {attempt + 1}/{max_retries + 1}), "
+                        f"retrying in {delay_ms}ms: {e}"
+                    )
+                    time.sleep(delay_ms / 1000)
+                    delay_ms *= 2  # Exponential backoff
+                    session.rollback()  # Reset session state before retry
+                    continue
+            raise
+
+    # If we get here, all retries failed
+    if last_error:
+        logger.error(f"Database commit failed after {max_retries + 1} attempts")
+        raise last_error


⚠️ Potential issue | 🟠 Major

Retry logic may not preserve pending changes after rollback.

After session.rollback() on line 70, any pending changes to objects in the session are discarded. When the loop retries session.commit(), there's nothing to commit because the changes were rolled back.

For the status update methods (e.g., mark_in_progress), this means a retry after rollback will commit an empty transaction, silently failing to persist the intended change.

🐛 Proposed fix: Re-apply changes before retry

One approach is to have the caller handle retries by passing a callable that applies the changes:

-def _commit_with_retry(session: Session, max_retries: int = MAX_COMMIT_RETRIES) -> None: +def _commit_with_retry( + session: Session, + apply_changes: callable = None, + max_retries: int = MAX_COMMIT_RETRIES +) -> None: """ Commit a session with retry logic for transient errors. - - Handles SQLITE_BUSY, SQLITE_LOCKED, and similar transient errors - with exponential backoff. + + Args: + session: SQLAlchemy session to commit + apply_changes: Optional callable to re-apply changes after rollback + max_retries: Maximum number of retry attempts ... """ delay_ms = INITIAL_RETRY_DELAY_MS last_error = None for attempt in range(max_retries + 1): try: + if attempt > 0 and apply_changes: + apply_changes() session.commit() return except OperationalError as e: error_msg = str(e).lower() if "locked" in error_msg or "busy" in error_msg: last_error = e if attempt < max_retries: ... session.rollback() continue raise

Alternatively, consider using session.flush() to detect lock errors before commit, or restructure to re-query and re-apply the mutation on each retry.

🤖 Prompt for AI Agents

In `@api/feature_repository.py` around lines 37 - 77, The retry logic in _commit_with_retry currently calls session.rollback(), which discards pending changes so retries commit nothing (breaking methods like mark_in_progress); fix by changing _commit_with_retry to accept an apply_changes callable (e.g., def _commit_with_retry(session: Session, apply_changes: Callable[[Session], None], max_retries: int = MAX_COMMIT_RETRIES)) and invoke apply_changes(session) before each commit attempt (including retries) so mutations are re-applied after rollback; alternatively, if you prefer minimal change, call session.flush() before commit to surface lock errors early and avoid discarding pending changes, but the recommended approach is the apply_changes pattern so callers like mark_in_progress re-apply their status mutation on each retry.

mcp_server/feature_mcp.py

- Use feature_get_stats for progress checks - Use feature_get_summary for single feature status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The testing architecture was simplified - testing agents now work concurrently without claim/release coordination. - Remove feature_release_testing from allowed tools in client.py - Update testing_prompt.template.md to remove claim/release workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.claude/templates/testing_prompt.template.md:
- Around line 86-91: The prompt's "end your session" is ambiguous; update the
template around the echo step (the echo "[Testing] Feature #{id} verified -
still passing" >> claude-progress.txt line and the surrounding text) to include
an explicit termination action the agent should perform, e.g. call the
orchestrator completion API/tool, run an exit command (explicitly state "exit 0"
or equivalent), or note that the session will auto-terminate after logging —
reference the echo line and the phrase "end your session" so implementers
replace the vague wording with one concrete mechanism (tool call name or exact
exit instruction).

🧹 Nitpick comments (1)

.claude/templates/testing_prompt.template.md (1)

186-186: Consider clarifying "one iteration" language.

The phrase "You have one iteration" (line 186) is somewhat ambiguous. Combined with line 171 ("Test ONE feature thoroughly"), the intent appears to be limiting scope to a single feature per session. However, "iteration" could be misinterpreted as:

Only one attempt to fix a regression (even if the first fix fails)

No retries if browser automation encounters transient failures

One verification pass only

Consider more explicit phrasing such as:

"Your session is scoped to ONE feature. Complete all verification and any necessary fixes for that feature."

"Focus on testing ONE feature thoroughly. You may iterate on fixes for that feature until it passes."

.claude/templates/testing_prompt.template.md

- Add explicit note that session auto-terminates after logging - Clarify that 'one iteration' means one feature per session, not one fix attempt Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…orgeAI#128) Added TestHarnessKernelBudgetEnforcement class with 8 tests proving all 7 verification steps for HarnessKernel.execute() budget enforcement: 1. execute() called with compiled AgentSpec via --spec path 2. turns_used incremented after each turn, matches actual count 3. max_turns=2 stops execution after exactly 2 turns 4. Budget exhaustion sets status='timeout' (not 'failed') 5. 'timeout' event recorded in agent_events with payload 6. Acceptance validators run after budget exhaustion (graceful termination) 7. tokens_in and tokens_out tracked and stored on AgentRun 8. Token tracking persists even on timeout All 54 tests pass with no regressions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

nioasoft force-pushed the feat/token-optimization branch from 97c9739 to 32dfc85 Compare January 29, 2026 10:37

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

nioasoft and others added 2 commits January 29, 2026 13:07

fix: clarify token efficiency tool recommendations

904dbeb

- Use feature_get_stats for progress checks - Use feature_get_summary for single feature status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

.claude/templates/testing_prompt.template.md Outdated Show resolved Hide resolved

docs: clarify testing agent session scope and termination

899ea4b

- Add explicit note that session auto-terminates after logging - Clarify that 'one iteration' means one feature per session, not one fix attempt Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cabana8471-arch mentioned this pull request Jan 31, 2026

feat: add MCP servers for UI component libraries #141

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): optimize token consumption in MCP responses#128

feat(mcp): optimize token consumption in MCP responses#128
nioasoft wants to merge 4 commits intoAutoForgeAI:masterfrom
nioasoft:feat/token-optimization

nioasoft commented Jan 29, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nioasoft commented Jan 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Phase 1: MCP Response Optimization (Highest Impact)

Phase 2: App Spec Summary Tool

Phase 3: Assistant Chat History Optimization

Documentation

Expected Token Savings

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nioasoft commented Jan 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 29, 2026 •

edited

Loading