feat: refactor pr comments coderabbit style by dkargatzis · Pull Request #57 · warestack/watchflow

dkargatzis · 2026-02-27T10:05:04Z

Closes #56

Summary by CodeRabbit

Bug Fixes
- Unified governance header ("🛡️ Watchflow Governance Checks"), added status/counts and collapsible severity sections; updated trailing guidance with a highlighted tip ("💡").
New Features
- Violations and recommendation details rendered in expandable/collapsible blocks; added explicit Info severity and fallback-to-Low handling in summaries.
- Acknowledgment flows now include collapsible violations when applicable.
Tests
- Updated expectations for new formatting, empty-case behavior (now returns empty), and added severity coverage.

…details and actionable footers

watchflow · 2026-02-27T10:05:09Z

🚨 Watchflow Rule Violations Detected

🟡 Medium Severity

Checks PR description (body) and title for a linked issue reference (e.g. #123, Fixes #123, Closes #456). Use when the rule requires issue refs in either field.

PR does not reference a linked issue (e.g. #123 or closes #123 in body/title)

Rule validation failed with severity: medium
How to fix: Add an issue reference in the PR title or description (e.g. Fixes #123).

Validates if the PR description meets minimum length requirements

PR description is empty

Rule validation failed with severity: medium
How to fix: Add a description with at least 50 characters.

This comment was automatically generated by Watchflow.
To configure rules, edit the .watchflow/rules.yaml file in this repository.

coderabbitai · 2026-02-27T10:05:29Z

Warning

Rate limit exceeded

@dkargatzis has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 52 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6086aa2 and 33fa29c.

📒 Files selected for processing (2)

src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py

📝 Walkthrough

Walkthrough

Reformats GitHub outputs to a governance-style header and uses <details>/<summary> collapsible sections for violations; adds _build_collapsible_violations_text in src/presentation/github_formatter.py; updates related tests and wraps rule descriptions in collapsible blocks in src/api/recommendations.py.

Changes

Cohort / File(s)	Summary
Violation Formatter `src/presentation/github_formatter.py`	Adds private helper `_build_collapsible_violations_text(violations: list[Violation]) -> str`; replaces per-severity plain headers with a governance status line and `<details>/<summary>` collapsible severity groups; updates `format_check_run_output`, `format_violations_comment`, and `format_acknowledgment_check_run` outputs and footer hint.
Tests `tests/unit/presentation/test_github_formatter.py`	Updates assertions for new "🛡️ Watchflow Governance Checks" header, status/count line, `<summary>`-based severity groupings, new acknowledgment hint, empty-case returning empty string, and adds tests for fallback and INFO severity handling.
Recommendations PR Body `src/api/recommendations.py`	Wraps each rule description in `<details><summary>...</summary>` blocks and adjusts spacing/rationale placement when generating PR body content.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

chore: update watchflow rules #54 — Modifies src/presentation/github_formatter.py formatting; directly related to severity/group rendering changes.

Poem

🐰 I fold the warnings soft and light,
🛡️ Governance tucked away from sight,
📦 Click to open, see what's wrong,
✨ Ack or fix — the rabbit's song,
🐇 Tests hop in — the format's right.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor pr comments coderabbit style' directly and clearly summarizes the main change: refactoring PR comment formatting to match CodeRabbit's style with collapsible sections.
Linked Issues check	✅ Passed	The PR successfully implements all primary coding objectives from issue `#56`: collapsible sections with / tags, violations grouped by severity, actionable footer with acknowledgment commands, and updated tests.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the linked objectives: github_formatter.py implements collapsible formatting, test_github_formatter.py validates the new format, and recommendations.py applies consistent collapsible styling to PR bodies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ux/refactor-pr-comments-coderabbit-style

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-02-27T10:06:24Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 98.07692% with 1 line in your changes missing coverage. Please review.

❌ Your project status has failed because the head coverage (68.8%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@          Coverage Diff          @@
##            main     #57   +/-   ##
=====================================
  Coverage   68.7%   68.8%           
=====================================
  Files        154     154           
  Lines       9712    9725   +13     
=====================================
+ Hits        6681    6693   +12     
- Misses      3031    3032    +1

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69627bd...33fa29c. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/presentation/github_formatter.py (1)
108-119: Extract shared severity rendering/grouping to prevent formatter drift.

The <details> section rendering is duplicated across check-run and PR comment formatting, which makes future style updates easy to miss in one path.

Also applies to: 179-190
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/presentation/github_formatter.py` around lines 108 - 119, Extract the
duplicated "<details>" severity grouping/rendering into a single helper function
(e.g., render_severity_details(severity, violations, emoji, count) or
render_severity_block(severity, severity_groups[severity], emoji, count)) and
replace the duplicated blocks in both the check-run and PR comment formatting
with calls to that helper; the helper should build the same string currently
produced (including summary, per-violation title using
violation.rule_description or 'Unknown Rule', message, optional "How to fix:"
line, and closing </details>), be used by the places that currently append to
the local text variable, and preserve ordering and formatting so both code paths
share the same implementation.
tests/unit/presentation/test_github_formatter.py (1)
19-33: Add a regression test for Severity.INFO section rendering.

Current assertions cover critical/high only; add an INFO case to ensure totals and displayed sections stay consistent when low-priority violations are present.

As per coding guidelines, "Write regression tests for every bug fix; keep CI coverage thresholds green."

Also applies to: 50-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/presentation/test_github_formatter.py` around lines 19 - 33, Add a
regression assertion to the existing test that verifies INFO-severity violations
are rendered and counted: update the test that calls format_violations_comment
(e.g., test_format_violations_comment) to include at least one violation with
severity INFO and assert the comment contains an INFO section header and that
the total Status line still reflects the correct violation count; reference the
format_violations_comment function and the test name when locating where to add
the new assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/presentation/github_formatter.py`:
- Around line 106-110: The PR comment currently omits or mislabels violations
with Severity.INFO or other unexpected severities; update the grouping and
labeling code (the severity_groups construction and the block that builds the
details using severity, emoji and count) so that all severities present in the
violations collection—including Severity.INFO and any unknown strings—are
rendered consistently in the comment instead of being dropped or remapped to
"low"; ensure the same mapping logic is used in both the PR-comment path and the
check-run path (where remapping to low currently happens), and adjust the
summary/total calculations so the total reported (the earlier total) equals the
sum of the shown bucket counts.

---

Nitpick comments:
In `@src/presentation/github_formatter.py`:
- Around line 108-119: Extract the duplicated "<details>" severity
grouping/rendering into a single helper function (e.g.,
render_severity_details(severity, violations, emoji, count) or
render_severity_block(severity, severity_groups[severity], emoji, count)) and
replace the duplicated blocks in both the check-run and PR comment formatting
with calls to that helper; the helper should build the same string currently
produced (including summary, per-violation title using
violation.rule_description or 'Unknown Rule', message, optional "How to fix:"
line, and closing </details>), be used by the places that currently append to
the local text variable, and preserve ordering and formatting so both code paths
share the same implementation.

In `@tests/unit/presentation/test_github_formatter.py`:
- Around line 19-33: Add a regression assertion to the existing test that
verifies INFO-severity violations are rendered and counted: update the test that
calls format_violations_comment (e.g., test_format_violations_comment) to
include at least one violation with severity INFO and assert the comment
contains an INFO section header and that the total Status line still reflects
the correct violation count; reference the format_violations_comment function
and the test name when locating where to add the new assertions.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69627bd and c0a156e.

📒 Files selected for processing (2)

src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

tests/unit/presentation/test_github_formatter.py

🧬 Code graph analysis (1)

tests/unit/presentation/test_github_formatter.py (1)

src/presentation/github_formatter.py (1)

format_violations_comment (157-194)

src/presentation/github_formatter.py

…ent summaries

…tyle collapsible sections

watchflow · 2026-02-27T11:47:30Z

🚨 Watchflow Rule Violations Detected

🟡 Medium Severity

Checks PR description (body) and title for a linked issue reference (e.g. #123, Fixes #123, Closes #456). Use when the rule requires issue refs in either field.

PR does not reference a linked issue (e.g. #123 or closes #123 in body/title)

Rule validation failed with severity: medium
How to fix: Add an issue reference in the PR title or description (e.g. Fixes #123).

This comment was automatically generated by Watchflow.
To configure rules, edit the .watchflow/rules.yaml file in this repository.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/presentation/github_formatter.py (1)

32-42: ⚠️ Potential issue | 🟠 Major

Do not remap info/unknown severities to low.

Line 32 excludes info, and Line 40-Line 42 force unknown severities into low, which mislabels output and can skew displayed buckets.

Proposed fix

-    severity_order = ["critical", "high", "medium", "low"]
+    severity_order = ["critical", "high", "medium", "low", "info"]
@@
-        else:
-            if "low" not in severity_groups:
-                severity_groups["low"] = []
-            severity_groups["low"].append(violation)
+        else:
+            severity_groups.setdefault("info", []).append(violation)

Also apply the same mapping strategy in format_check_run_output (Line 116-Line 127) so summary counts and details stay consistent.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/presentation/github_formatter.py` around lines 32 - 42, The code
currently drops or remaps "info"/unknown severities into "low"; update grouping
logic so severity_order includes "info" (e.g., severity_order =
["critical","high","medium","low","info"]) and change the loop that builds
severity_groups (and the equivalent logic in format_check_run_output) to
preserve any severity string not already in severity_groups by creating a new
key for that exact severity instead of forcing it into "low"; use
violation.severity.value or str(...) to derive the key (as in sev) and append to
severity_groups[sev] if missing create severity_groups[sev] = [] first so
counts/details remain accurate.

🧹 Nitpick comments (1)

tests/unit/presentation/test_github_formatter.py (1)
10-29: Add an INFO severity regression assertion in this suite.

This updated test validates critical/high rendering, but it should also cover Severity.INFO (and ideally unknown fallback) to lock in severity bucket behavior.

As per coding guidelines "tests/**/*.py: Write regression tests for every bug fix; keep CI coverage thresholds green".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/presentation/test_github_formatter.py` around lines 10 - 29,
Update the test_format_violations_comment_groups_by_severity test to include a
Violation with Severity.INFO (and optionally one with an unknown/None severity)
when building the violations list, then adjust assertions to expect the INFO
bucket rendering (e.g., "<summary>🔵 Info Severity (1)</summary>" or
whatever the formatter uses) and updated counts/status text returned by
format_violations_comment; reference the test function name and
format_violations_comment and Severity.INFO to locate where to add the new
Violation and corresponding assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/api/recommendations.py`:
- Line 355: The HTML string interpolation inserts unescaped dynamic content
(description and possibly severity) into a <summary> tag; escape these values
before rendering to prevent malformed HTML or injection. Update the code that
constructs the summary (the f-string containing "<summary><b>{severity}
Severity:</b> {description}</summary>") to apply an HTML-escaping function
(e.g., html.escape or markupsafe.escape) to description (and severity if
variable) at the point of formatting, and add the corresponding import at the
top of the module so the rendered summary contains safe, escaped text.

---

Duplicate comments:
In `@src/presentation/github_formatter.py`:
- Around line 32-42: The code currently drops or remaps "info"/unknown
severities into "low"; update grouping logic so severity_order includes "info"
(e.g., severity_order = ["critical","high","medium","low","info"]) and change
the loop that builds severity_groups (and the equivalent logic in
format_check_run_output) to preserve any severity string not already in
severity_groups by creating a new key for that exact severity instead of forcing
it into "low"; use violation.severity.value or str(...) to derive the key (as in
sev) and append to severity_groups[sev] if missing create severity_groups[sev] =
[] first so counts/details remain accurate.

---

Nitpick comments:
In `@tests/unit/presentation/test_github_formatter.py`:
- Around line 10-29: Update the
test_format_violations_comment_groups_by_severity test to include a Violation
with Severity.INFO (and optionally one with an unknown/None severity) when
building the violations list, then adjust assertions to expect the INFO bucket
rendering (e.g., "<summary><b>🔵 Info Severity (1)</b></summary>" or whatever
the formatter uses) and updated counts/status text returned by
format_violations_comment; reference the test function name and
format_violations_comment and Severity.INFO to locate where to add the new
Violation and corresponding assertions.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0a156e and 7bd737b.

📒 Files selected for processing (3)

src/api/recommendations.py
src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py
src/api/recommendations.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

tests/unit/presentation/test_github_formatter.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Keep PRs small (~<300 LOC where possible); split refactors into separate PRs

📚 Learning: 2026-01-31T19:35:22.504Z

Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Include in PR body: rationale, behavior change, risk/rollback, and tests/docs updates

Applied to files:

src/api/recommendations.py

🧬 Code graph analysis (2)

src/presentation/github_formatter.py (1)

src/core/models.py (1)

Violation (20-31)

tests/unit/presentation/test_github_formatter.py (1)

src/presentation/github_formatter.py (1)

format_violations_comment (178-188)

🔇 Additional comments (1)

src/presentation/github_formatter.py (1)

180-186: Nice cleanup on empty-state and governance framing.

The early return on Line 180-Line 181 plus the concise status/help messaging make the comment output clearer.

coderabbitai · 2026-02-27T11:51:20Z

src/api/recommendations.py

            [
-                f"### {description} - {severity}",
+                "<details>",
+                f"<summary><b>{severity} Severity:</b> {description}</summary>",


⚠️ Potential issue | 🟡 Minor

Escape dynamic <summary> content before rendering.

Line 355 inserts description directly into HTML. Special characters (or </summary>) can break the collapsible block rendering.

Proposed fix

+import html @@ - body_lines.extend( + safe_severity = html.escape(severity) + safe_description = html.escape(description) + body_lines.extend( [ "<details>", - f"<summary>{severity} Severity: {description}</summary>", + f"<summary>{safe_severity} Severity: {safe_description}</summary>", "", ] )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

f"<summary>{severity} Severity: {description}</summary>",

import html

safe_severity = html.escape(severity)

safe_description = html.escape(description)

body_lines.extend(

[

"<details>",

f"<summary>{safe_severity} Severity: {safe_description}</summary>",

"",

]

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/api/recommendations.py` at line 355, The HTML string interpolation inserts unescaped dynamic content (description and possibly severity) into a <summary> tag; escape these values before rendering to prevent malformed HTML or injection. Update the code that constructs the summary (the f-string containing "<summary>{severity} Severity: {description}</summary>") to apply an HTML-escaping function (e.g., html.escape or markupsafe.escape) to description (and severity if variable) at the point of formatting, and add the corresponding import at the top of the module so the rendered summary contains safe, escaped text.

coderabbitai

♻️ Duplicate comments (1)

src/presentation/github_formatter.py (1)

39-49: ⚠️ Potential issue | 🟠 Major

Severity bucketing still misclassifies info as low.

Severity.INFO is a valid model value, but at Line 39 and Line 133 the order omits info, and the fallback path at Line 47 and Line 141 remaps to low. This causes mislabeled output in both the PR comment and check-run summary.

Suggested fix

-    severity_order = ["critical", "high", "medium", "low"]
+    severity_order = ["critical", "high", "medium", "low", "info"]
@@
-        else:
-            if "low" not in severity_groups:
-                severity_groups["low"] = []
-            severity_groups["low"].append(violation)
+        else:
+            severity_groups.setdefault("info", []).append(violation)
@@
-    severity_order = ["critical", "high", "medium", "low"]
+    severity_order = ["critical", "high", "medium", "low", "info"]
@@
-        else:
-            # Fallback for unexpected severities
-            if "low" not in severity_groups:
-                severity_groups["low"] = []
-            severity_groups["low"].append(violation)
+        else:
+            severity_groups.setdefault("info", []).append(violation)

Also applies to: 133-145

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/presentation/github_formatter.py` around lines 39 - 49, The severity
bucketing logic misclassifies Severity.INFO because the severity_order list and
severity_groups initialization (severity_order, severity_groups) omit "info" and
the fallback branch (the sev handling and else that appends to "low") remaps
unknown severities to "low"; update both occurrences (the severity_order/
severity_groups setup and the loop that computes sev and handles the else
fallback) to include "info" in the ordered list and to append unknown/mapped
"info" severities to "info" (not "low") so that Severity.INFO is correctly
bucketed in both the PR comment and check-run summary.

🧹 Nitpick comments (1)

tests/unit/presentation/test_github_formatter.py (1)

36-46: Add a regression test for valid Severity.INFO rendering.

This test covers unknown-severity fallback, but not the normal INFO path from the enum. Add one explicit INFO test so severity grouping regressions are caught without model bypass hacks.

Suggested test addition

+def test_build_collapsible_violations_text_info_severity_bucket():
+    v = Violation(rule_description="Info Rule", severity=Severity.INFO, message="Informational message")
+    comment = format_violations_comment([v])
+    assert "<summary><b>⚪ Info Severity (1)</b></summary>" in comment
+    assert "Informational message" in comment

As per coding guidelines: "Write regression tests for every bug fix; keep CI coverage thresholds green".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/presentation/test_github_formatter.py` around lines 36 - 46, Add a
new unit test that verifies normal INFO-severity rendering: create a Violation
instance with severity=Severity.INFO and a unique message, call
format_violations_comment on a list containing it, and assert the returned
comment contains the INFO severity summary (e.g. the Info bucket label) and the
message; implement this as a new test function (e.g.
test_build_collapsible_violations_text_info) using the Violation class,
Severity.INFO, and format_violations_comment so regressions in severity grouping
are caught without bypassing Pydantic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/presentation/github_formatter.py`:
- Around line 39-49: The severity bucketing logic misclassifies Severity.INFO
because the severity_order list and severity_groups initialization
(severity_order, severity_groups) omit "info" and the fallback branch (the sev
handling and else that appends to "low") remaps unknown severities to "low";
update both occurrences (the severity_order/ severity_groups setup and the loop
that computes sev and handles the else fallback) to include "info" in the
ordered list and to append unknown/mapped "info" severities to "info" (not
"low") so that Severity.INFO is correctly bucketed in both the PR comment and
check-run summary.

---

Nitpick comments:
In `@tests/unit/presentation/test_github_formatter.py`:
- Around line 36-46: Add a new unit test that verifies normal INFO-severity
rendering: create a Violation instance with severity=Severity.INFO and a unique
message, call format_violations_comment on a list containing it, and assert the
returned comment contains the INFO severity summary (e.g. the Info bucket label)
and the message; implement this as a new test function (e.g.
test_build_collapsible_violations_text_info) using the Violation class,
Severity.INFO, and format_violations_comment so regressions in severity grouping
are caught without bypassing Pydantic.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bd737b and 2945872.

📒 Files selected for processing (2)

src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

tests/unit/presentation/test_github_formatter.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Include in PR body: rationale, behavior change, risk/rollback, and tests/docs updates

📚 Learning: 2026-01-31T19:35:22.504Z

Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths

Applied to files:

tests/unit/presentation/test_github_formatter.py

🧬 Code graph analysis (2)

src/presentation/github_formatter.py (1)

src/core/models.py (1)

Violation (20-31)

tests/unit/presentation/test_github_formatter.py (2)

src/presentation/github_formatter.py (1)

format_violations_comment (195-213)

src/core/models.py (2)

Violation (20-31)

Severity (8-17)

🔇 Additional comments (2)

src/presentation/github_formatter.py (1)

205-212: Nice improvement: no-op on clean runs and clear action footer.

Returning an empty string for no violations and keeping the compact ack/help footer matches the “comment only when action is required” objective well.

tests/unit/presentation/test_github_formatter.py (1)

19-29: Test expectations are aligned with the new collapsible comment format.

The updated header, <summary> assertions, and ack hint checks are solid and match the presentation refactor.

Also applies to: 33-33

coderabbitai

🧹 Nitpick comments (2)

src/presentation/github_formatter.py (2)

46-49: Dead code: conditional check is always false.

severity_groups is initialized on line 40 with "low" as a key, so the condition "low" not in severity_groups is always False. The if block is unreachable.

Simplify the fallback logic

         else:
-            if "low" not in severity_groups:
-                severity_groups["low"] = []
             severity_groups["low"].append(violation)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/presentation/github_formatter.py` around lines 46 - 49, The fallback
branch contains an unreachable check because severity_groups is prepopulated
with "low"; remove the redundant if "low" not in severity_groups check and
directly append the violation to severity_groups["low"] in the same block
(update the code that manipulates severity_groups and the list appended with
violation).

140-144: Same dead code pattern as the helper function.

The check if "low" not in severity_groups: is always false here as well.

Simplify

         else:
-            # Fallback for unexpected severities
-            if "low" not in severity_groups:
-                severity_groups["low"] = []
+            # Fallback for unexpected severities to low
             severity_groups["low"].append(violation)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/presentation/github_formatter.py` around lines 140 - 144, The fallback
block contains a redundant check—`if "low" not in severity_groups:`—which can
never be true here; remove that dead conditional and simply ensure the "low"
bucket exists and append the violation. Replace the two-step conditional+append
with a single operation that guarantees the "low" key (for example, using
dict.setdefault or creating the key if missing) and then append the violation to
severity_groups["low"]; update the same pattern in the helper function if
present.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/presentation/github_formatter.py`:
- Around line 46-49: The fallback branch contains an unreachable check because
severity_groups is prepopulated with "low"; remove the redundant if "low" not in
severity_groups check and directly append the violation to
severity_groups["low"] in the same block (update the code that manipulates
severity_groups and the list appended with violation).
- Around line 140-144: The fallback block contains a redundant check—`if "low"
not in severity_groups:`—which can never be true here; remove that dead
conditional and simply ensure the "low" bucket exists and append the violation.
Replace the two-step conditional+append with a single operation that guarantees
the "low" key (for example, using dict.setdefault or creating the key if
missing) and then append the violation to severity_groups["low"]; update the
same pattern in the helper function if present.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2945872 and 6086aa2.

📒 Files selected for processing (2)

src/presentation/github_formatter.py
tests/unit/presentation/test_github_formatter.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/presentation/test_github_formatter.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

src/presentation/github_formatter.py

🧬 Code graph analysis (1)

src/presentation/github_formatter.py (1)

src/core/models.py (1)

Violation (20-31)

🔇 Additional comments (2)

src/presentation/github_formatter.py (2)

195-213: LGTM!

The refactored function correctly suppresses comments when no violations exist (returning empty string), uses the new collapsible helper for consistent formatting, and includes the actionable footer per PR objectives.

292-294: LGTM!

Consistent use of the collapsible violations helper for remaining violations in the acknowledgment check run output.

…ng back to LOW

dkargatzis added 2 commits February 27, 2026 11:55

ux: refactor PR comments and Check Run formatting to use collapsible …

5d2cee3

…details and actionable footers

style: fix trailing whitespace pre-commit issues

c0a156e

dkargatzis self-assigned this Feb 27, 2026

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

src/presentation/github_formatter.py Outdated Show resolved Hide resolved

dkargatzis added 3 commits February 27, 2026 12:32

ux: apply collapsible details format to PR check runs and acknowledgm…

efedc65

…ent summaries

style: fix remaining trailing whitespace from presentation formatter

ecaacdd

ux: align PR recommendation body formatting with the new CodeRabbit-s…

7bd737b

…tyle collapsible sections

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

test: add test coverage for missing severity fallback logic in formatter

2945872

dkargatzis added the enhancement New feature or request label Feb 27, 2026

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

fix: ensure INFO severity rules are properly grouped instead of falli…

33fa29c

…ng back to LOW

dkargatzis force-pushed the ux/refactor-pr-comments-coderabbit-style branch from 6086aa2 to 33fa29c Compare February 27, 2026 18:36

dkargatzis merged commit 4f3650c into main Feb 27, 2026
4 checks passed

coderabbitai bot mentioned this pull request Feb 27, 2026

docs: remove legacy rule ids #58

Merged

-                f"<summary><b>{severity} Severity:</b> {description}</summary>",
+import html
+        safe_severity = html.escape(severity)
+        safe_description = html.escape(description)
+        body_lines.extend(
+            [
+                "<details>",
+                f"<summary><b>{safe_severity} Severity:</b> {safe_description}</summary>",
+                "",
+            ]
+        )

Conversation

dkargatzis commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

watchflow bot commented Feb 27, 2026

🚨 Watchflow Rule Violations Detected

🟡 Medium Severity

Checks PR description (body) and title for a linked issue reference (e.g. #123, Fixes #123, Closes #456). Use when the rule requires issue refs in either field.

Validates if the PR description meets minimum length requirements

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

watchflow bot commented Feb 27, 2026

🚨 Watchflow Rule Violations Detected

🟡 Medium Severity

Checks PR description (body) and title for a linked issue reference (e.g. #123, Fixes #123, Closes #456). Use when the rule requires issue refs in either field.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dkargatzis commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

codecov-commenter commented Feb 27, 2026 •

edited

Loading