feat(mcp): optimize token consumption in MCP responses #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

nioasoft wants to merge 4 commits into AutoForgeAI:master from nioasoft:feat/token-optimization

+188 −42

.claude/templates/coding_prompt.template.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -31,8 +31,7 @@ Then use MCP tools to check feature status: @@
     Use the feature_get_stats tool
     ```
-    Understanding the `app_spec.txt` is critical - it contains the full requirements
-    for the application you're building.
+    **NOTE:** Do NOT read `app_spec.txt` directly (12,500+ tokens). If you need project context, use `spec_get_summary` tool (~800 tokens) which returns project name, tech stack, ports, and overview.
     ### STEP 2: START SERVERS (IF NOT RUNNING)
@@ Expand Down Expand Up / @@ -363,6 +362,9 @@ feature_skip with feature_id={id} @@
     # 7. Clear in-progress status (when abandoning a feature)
     feature_clear_in_progress with feature_id={id}
+    # 8. Get condensed project spec (~800 tokens vs 12,500 full)
+    spec_get_summary
     ```
     ### RULES:
@@ Expand Down Expand Up @@
     ---
+    ## TOKEN EFFICIENCY
+    To maximize context window usage:
+    - **Don't read files unnecessarily** - Feature details from `feature_get_by_id` contain everything you need
+    - **Be concise** - Short, focused responses save tokens for actual work
+    - **Use `feature_get_stats`** for progress checks, `feature_get_summary` for single feature status
+    - **Use `spec_get_summary`** for project context (~800 tokens vs 12,500 for full app_spec.txt)
+    - **Avoid re-reading large files** - Read once, remember the content
+    ---
     **Remember:** One feature per session. Zero console errors. All data from real database. Leave codebase clean before ending session.
     ---
@@ Expand Down @@

.claude/templates/testing_prompt.template.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -48,9 +48,7 @@ Your feature has been pre-assigned by the orchestrator. Use `feature_get_by_id`
  
    Use the feature_get_by_id tool with feature_id={your_assigned_id}

    ```

    The orchestrator has already claimed this feature for testing (set `testing_in_progress=true`).

    **CRITICAL:** You MUST call `feature_release_testing` when done, regardless of pass/fail.

    The orchestrator has assigned this feature for you to test.

    ### STEP 4: VERIFY THE FEATURE

    @@ -85,18 +83,17 @@ Use browser automation tools:
  
    #### If the feature PASSES:

    The feature still works correctly. Release the claim and end your session:

    ```

    # Release the testing claim (tested_ok=true)

    Use the feature_release_testing tool with feature_id={id} and tested_ok=true

    The feature still works correctly. Log the result:

    ```bash

    # Log the successful verification

    echo "[Testing] Feature #{id} verified - still passing" >> claude-progress.txt

    ```

    **DO NOT** call feature_mark_passing again - it's already passing.

    **Session will auto-terminate** after you complete the logging step. No explicit exit action needed.

    #### If the feature FAILS (regression found):

    A regression has been introduced. You MUST fix it:

    @@ -125,13 +122,7 @@ A regression has been introduced. You MUST fix it:
  
       Use the feature_mark_passing tool with feature_id={id}

       ```

    6. **Release the testing claim:**

       ```

       Use the feature_release_testing tool with feature_id={id} and tested_ok=false

       ```

       Note: tested_ok=false because we found a regression (even though we fixed it).

    7. **Commit the fix:**

    6. **Commit the fix:**

       ```bash

       git add .

       git commit -m "Fix regression in [feature name]

    @@ -156,7 +147,6 @@ echo "[Testing] Session complete - verified/fixed feature #{id}" >> claude-progr
  
    ### Feature Management

    - `feature_get_stats` - Get progress overview (passing/in_progress/total counts)

    - `feature_get_by_id` - Get your assigned feature details

    - `feature_release_testing` - **REQUIRED** - Release claim after testing (pass tested_ok=true/false)

    - `feature_mark_failing` - Mark a feature as failing (when you find a regression)

    - `feature_mark_passing` - Mark a feature as passing (after fixing a regression)

    @@ -188,20 +178,14 @@ All interaction tools have **built-in auto-wait** - no manual timeouts needed.
  
    - Visual appearance correct

    - API calls succeed

    **CRITICAL - Always release your claim:**

    - Call `feature_release_testing` when done, whether pass or fail

    - Pass `tested_ok=true` if the feature passed

    - Pass `tested_ok=false` if you found a regression

    **If you find a regression:**

    1. Mark the feature as failing immediately

    2. Fix the issue

    3. Verify the fix with browser automation

    4. Mark as passing only after thorough verification

    5. Release the testing claim with `tested_ok=false`

    6. Commit the fix

    5. Commit the fix

    **You have one iteration.** Focus on testing ONE feature thoroughly.

    **Your session is scoped to ONE feature.** Complete all verification and any necessary fixes for that feature. You may iterate on fixes until it passes.

    ---

api/database.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -82,6 +82,32 @@ def get_dependencies_safe(self) -> list[int]: @@
                 return [d for d in self.dependencies if isinstance(d, int)]
             return []
+        def to_minimal_dict(self) -> dict:
+            """Return minimal feature info for token-efficient responses.
+            Use this instead of to_dict() when you only need status/dependency info,
+            not the full description and steps. Reduces response size by ~80%.
+            """
+            return {
+                "id": self.id,
+                "name": self.name,
+                "priority": self.priority,
+                "passes": self.passes if self.passes is not None else False,
+                "in_progress": self.in_progress if self.in_progress is not None else False,
+                "dependencies": self.dependencies if self.dependencies else [],
+            }
+        def to_cycle_check_dict(self) -> dict:
+            """Return only fields needed for cycle detection.
+            Use this for circular dependency validation - drastically reduces
+            token usage compared to to_dict() (~95% reduction).
+            """
+            return {
+                "id": self.id,
+                "dependencies": self.dependencies if self.dependencies else [],
+            }
     class Schedule(Base):
         """Time-based schedule for automated agent start/stop."""
@@ Expand Down @@

client.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -189,7 +189,6 @@ def get_extra_read_paths() -> list[Path]: @@
         "mcp__features__feature_create_bulk",
         "mcp__features__feature_create",
         "mcp__features__feature_clear_in_progress",
-        "mcp__features__feature_release_testing",  # Release testing claim
         # Dependency management
         "mcp__features__feature_add_dependency",
         "mcp__features__feature_remove_dependency",
@@ Expand Down @@

mcp_server/feature_mcp.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -686,7 +686,8 @@ def feature_add_dependency( @@
             # Security: Circular dependency check
             # would_create_circular_dependency(features, source_id, target_id)
             # source_id = feature gaining the dependency, target_id = feature being depended upon
-            all_features = [f.to_dict() for f in session.query(Feature).all()]
+            # Use to_cycle_check_dict() for minimal token usage (~95% reduction)
+            all_features = [f.to_cycle_check_dict() for f in session.query(Feature).all()]
             if would_create_circular_dependency(all_features, feature_id, dependency_id):
                 return json.dumps({"error": "Cannot add: would create circular dependency"})
@@ Expand Down Expand Up / @@ -749,7 +750,8 @@ def feature_remove_dependency( @@
     @mcp.tool()
     def feature_get_ready(
-        limit: Annotated[int, Field(default=10, ge=1, le=50, description="Max features to return")] = 10
+        limit: Annotated[int, Field(default=10, ge=1, le=50, description="Max features to return")] = 10,
+        minimal: Annotated[bool, Field(default=True, description="Return minimal fields (id, name, priority, status, deps) to reduce tokens")] = True
     ) -> str:
         """Get all features ready to start (dependencies satisfied, not in progress).
@@ Expand All / @@ -758,6 +760,7 @@ def feature_get_ready( @@
         Args:
             limit: Maximum number of features to return (1-50, default 10)
+            minimal: If True (default), return only essential fields. Set False for full details.
         Returns:
             JSON with: features (list), count (int), total_ready (int)
@@ Expand All / @@ -774,7 +777,8 @@ def feature_get_ready( @@
                     continue
                 deps = f.dependencies or []
                 if all(dep_id in passing_ids for dep_id in deps):
-                    ready.append(f.to_dict())
+                    # Use minimal or full serialization based on parameter
+                    ready.append(f.to_minimal_dict() if minimal else f.to_dict())
             # Sort by scheduling score (higher = first), then priority, then id
             scores = compute_scheduling_scores(all_dicts)
@@ Expand All / @@ -791,7 +795,8 @@ def feature_get_ready( @@
     @mcp.tool()
     def feature_get_blocked(
-        limit: Annotated[int, Field(default=20, ge=1, le=100, description="Max features to return")] = 20
+        limit: Annotated[int, Field(default=20, ge=1, le=100, description="Max features to return")] = 20,
+        minimal: Annotated[bool, Field(default=True, description="Return minimal fields (id, name, priority, status, deps) to reduce tokens")] = True
     ) -> str:
         """Get features that are blocked by unmet dependencies.
@@ Expand All / @@ -800,6 +805,7 @@ def feature_get_blocked( @@
         Args:
             limit: Maximum number of features to return (1-100, default 20)
+            minimal: If True (default), return only essential fields. Set False for full details.
         Returns:
             JSON with: features (list with blocked_by field), count (int), total_blocked (int)
@@ Expand All / @@ -816,8 +822,10 @@ def feature_get_blocked( @@
                 deps = f.dependencies or []
                 blocking = [d for d in deps if d not in passing_ids]
                 if blocking:
+                    # Use minimal or full serialization based on parameter
+                    base_dict = f.to_minimal_dict() if minimal else f.to_dict()
                     blocked.append({
-                        **f.to_dict(),
+                        **base_dict,
                         "blocked_by": blocking
                     })
@@ Expand All / @@ -842,7 +850,17 @@ def feature_get_graph() -> str: @@
         """
         session = get_session()
         try:
-            all_features = session.query(Feature).all()
+            # Optimized: Query only columns needed for graph visualization
+            # Avoids loading description, steps, timestamps, last_error
+            all_features = session.query(
+                Feature.id,
+                Feature.name,
+                Feature.category,
+                Feature.priority,
+                Feature.passes,
+                Feature.in_progress,
+                Feature.dependencies
+            ).all()
             passing_ids = {f.id for f in all_features if f.passes}
             nodes = []
@@ Expand Down Expand Up / @@ -922,7 +940,8 @@ def feature_set_dependencies( @@
                 return json.dumps({"error": f"Dependencies not found: {missing}"})
             # Check for circular dependencies
-            all_features = [f.to_dict() for f in session.query(Feature).all()]
+            # Use to_cycle_check_dict() for minimal token usage (~95% reduction)
+            all_features = [f.to_cycle_check_dict() for f in session.query(Feature).all()]
             # Temporarily update the feature's dependencies for cycle check
             test_features = []
             for f in all_features:
@@ Expand Down Expand Up / @@ -952,5 +971,98 @@ def feature_set_dependencies( @@
             session.close()
+    @mcp.tool()
+    def spec_get_summary() -> str:
+        """Get condensed project specification summary (~800 tokens vs ~12,500 full).
+        Returns only essential project info:
+        - project_name: Name of the project
+        - overview: First 200 chars of project overview
+        - technology_stack: Tech stack summary
+        - ports: Development server ports
+        - feature_count: Target number of features
+        Use this instead of reading the full app_spec.txt to save tokens.
+        For full details, read prompts/app_spec.txt directly.
+        Returns:
+            JSON with condensed project spec, or error if not found.
+        """
+        import re
+        spec_path = PROJECT_DIR / "prompts" / "app_spec.txt"
+        if not spec_path.exists():
+            return json.dumps({"error": "No app_spec.txt found in prompts directory"})
+        try:
+            content = spec_path.read_text(encoding="utf-8")
+        except Exception as e:
+            return json.dumps({"error": f"Failed to read app_spec.txt: {str(e)}"})
+        result: dict = {}
+        # Extract project_name (look for <project_name> tag or "Project:" header)
+        project_name_match = re.search(r"<project_name>\s*(.+?)\s*</project_name>", content, re.IGNORECASE)
+        if project_name_match:
+            result["project_name"] = project_name_match.group(1).strip()
+        else:
+            # Try alternative formats
+            alt_match = re.search(r"(?:Project|Name):\s*(.+?)(?:\n|$)", content, re.IGNORECASE)
+            result["project_name"] = alt_match.group(1).strip() if alt_match else "Unknown"
+        # Extract overview (first 200 chars)
+        overview_match = re.search(r"<overview>\s*(.+?)\s*</overview>", content, re.DOTALL | re.IGNORECASE)
+        if overview_match:
+            overview = overview_match.group(1).strip()
+            result["overview"] = overview[:200] + ("..." if len(overview) > 200 else "")
+        else:
+            # Try alternative formats
+            alt_match = re.search(r"(?:Overview|Description):\s*(.+?)(?:\n\n|$)", content, re.DOTALL | re.IGNORECASE)
+            if alt_match:
+                overview = alt_match.group(1).strip()
+                result["overview"] = overview[:200] + ("..." if len(overview) > 200 else "")
+            else:
+                result["overview"] = None
+        # Extract technology_stack
+        tech_match = re.search(r"<technology_stack>\s*(.+?)\s*</technology_stack>", content, re.DOTALL | re.IGNORECASE)
+        if tech_match:
+            # Parse tech stack lines into a list
+            tech_text = tech_match.group(1).strip()
+            tech_items = [line.strip().lstrip("- ") for line in tech_text.split("\n") if line.strip() and not line.strip().startswith("#")]
+            result["technology_stack"] = tech_items[:10]  # Cap at 10 items
+        else:
+            result["technology_stack"] = None
+        # Extract ports
+        ports_match = re.search(r"<ports>\s*(.+?)\s*</ports>", content, re.DOTALL | re.IGNORECASE)
+        if ports_match:
+            ports_text = ports_match.group(1).strip()
+            ports = {}
+            for line in ports_text.split("\n"):
+                if ":" in line:
+                    key, val = line.split(":", 1)
+                    key = key.strip().lstrip("- ")
+                    val = val.strip()
+                    # Try to extract port number
+                    port_num = re.search(r"\d+", val)
+                    if port_num:
+                        ports[key] = int(port_num.group())
+            result["ports"] = ports if ports else None
+        else:
+            result["ports"] = None
+        # Extract feature_count
+        feature_count_match = re.search(r"<feature_count>\s*(\d+)\s*</feature_count>", content, re.IGNORECASE)
+        if feature_count_match:
+            result["feature_count"] = int(feature_count_match.group(1))
+        else:
+            # Try alternative formats
+            alt_match = re.search(r"feature[_\s]*count[:\s]*(\d+)", content, re.IGNORECASE)
+            result["feature_count"] = int(alt_match.group(1)) if alt_match else None
+        return json.dumps(result)
     if __name__ == "__main__":
         mcp.run()

server/services/assistant_chat_session.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -347,22 +347,33 @@ async def send_message(self, user_message: str) -> AsyncGenerator[dict, None]:
  
                history = get_messages(self.project_dir, self.conversation_id)

                # Exclude the message we just added (last one)

                history = history[:-1] if history else []

                # Cap history to last 35 messages to prevent context overload

                history = history[-35:] if len(history) > 35 else history

                # Cap history to last 20 messages to prevent context overload

                history = history[-20:] if len(history) > 20 else history

                if history:

                    # Format history as context for Claude

                    # Progressive summarization for token efficiency:

                    # - Recent messages (last 5): up to 1500 chars each

                    # - Older messages (6-20): 100-char summaries

                    # This reduces token usage by ~50% compared to uniform truncation

                    history_lines = ["[Previous conversation history for context:]"]

                    for msg in history:

                    num_messages = len(history)

                    for i, msg in enumerate(history):

                        role = "User" if msg["role"] == "user" else "Assistant"

                        content = msg["content"]

                        # Truncate very long messages

                        if len(content) > 500:

                            content = content[:500] + "..."

                        # Calculate position from end (0 = most recent)

                        position_from_end = num_messages - 1 - i

                        if position_from_end < 5:

                            # Recent messages (last 5): allow up to 1500 chars

                            if len(content) > 1500:

                                content = content[:1500] + "..."

                        else:

                            # Older messages (6-20): 100-char summaries only

                            if len(content) > 100:

                                content = content[:100] + "..."

                        history_lines.append(f"{role}: {content}")

                    history_lines.append("[End of history. Continue the conversation:]")

                    history_lines.append(f"User: {user_message}")

                    message_to_send = "\n".join(history_lines)

                    logger.info(f"Loaded {len(history)} messages from conversation history")

                    logger.info(f"Loaded {len(history)} messages from conversation history (progressive summarization)")

            try:

                async for chunk in self._query_claude(message_to_send):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): optimize token consumption in MCP responses #128

Diff view

Diff view

There are no files selected for viewing

Uh oh!

feat(mcp): optimize token consumption in MCP responses #128

Are you sure you want to change the base?

feat(mcp): optimize token consumption in MCP responses #128

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!