Skip to content

feat: filter irrelevant gitHub events before rule evaluation#50

Merged
dkargatzis merged 4 commits intowarestack:mainfrom
naaa760:feat/add-eve-rul-git
Feb 26, 2026
Merged

feat: filter irrelevant gitHub events before rule evaluation#50
dkargatzis merged 4 commits intowarestack:mainfrom
naaa760:feat/add-eve-rul-git

Conversation

@naaa760
Copy link
Contributor

@naaa760 naaa760 commented Feb 17, 2026

feat: add event filtering to prevent rule evaluation on irrelevant GitHub events

Implement intelligent event filtering to skip rule engine evaluation on
irrelevant GitHub events. This prevents unnecessary processing and user
confusion from comments on closed/merged PRs or branch deletion events.

Changes:

  • Add centralized event filter utility (src/core/utils/event_filter.py)
  • Filter events in dispatcher before handler dispatch
  • Skip branch deletion events (deleted: true, null SHA)
  • Skip closed/merged/draft PRs and non-relevant PR actions
  • Skip archived repositories
  • Add PR state validation in pull_request processor
  • Add push event validation in push processor
  • Add comprehensive unit and integration tests

fix: #25

Summary by CodeRabbit

  • New Features
    • Webhook event filtering added to skip processing for archived repositories, closed/merged/draft pull requests, and deleted or empty pushes.
    • Dispatcher now pre-filters incoming events and returns a filtered status when applicable.
    • Pull request and push handlers short-circuit early for filtered/invalid states to avoid unnecessary processing.

@naaa760 naaa760 requested a review from dkargatzis as a code owner February 17, 2026 17:53
@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This PR introduces centralized GitHub webhook event filtering by creating a new event_filter module that validates archived repositories, PR states (closed, merged, draft), and push conditions (deletion, empty SHA). The dispatcher applies global pre-filtering before enqueueing events, while individual processors maintain early-exit checks. The handler action-filtering logic is removed as filtering now occurs upstream.

Changes

Cohort / File(s) Summary
Event Filtering Infrastructure
src/core/utils/event_filter.py
New module adding FilterResult dataclass and should_process_event() with helpers (_apply_filters, _filter_pull_request, _filter_push, _is_repo_archived). Implements checks for archived repos, PR actions/state/merged/draft, and push deletion/after-SHA validation. Logs filtered events.
Webhook Dispatcher & Handlers
src/webhooks/dispatcher.py, src/webhooks/handlers/pull_request.py
Dispatcher now calls should_process_event() and short-circuits filtered events (returns filtered status). Pull request handler early action-filtering removed to rely on centralized filter.
Event Processors
src/event_processors/pull_request/processor.py, src/event_processors/push.py
Processors add early-exit guards: PR processor skips closed/merged/draft PRs and returns successful skipped result; push processor skips deleted branches or NULL/empty after SHA and returns successful skipped result.
Unit Tests
tests/unit/core/test_event_filter.py, tests/unit/event_processors/test_pull_request_processor.py, tests/unit/event_processors/test_push_processor.py
New unit tests validate filter logic (PR actions/states/merged/draft, push deletions/empty SHA, archived repos) and processor skip paths for closed PRs and deleted branches.
Integration Tests
tests/integration/webhooks/test_webhook_flow.py
Fixture updates to include PR state/merged/draft and push deleted/after/commits. New test test_filtered_event_not_dispatched ensures filtered events (e.g., deleted push) are not dispatched to handlers.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Dispatcher as Dispatcher
    participant EventFilter as EventFilter
    participant Queue as Queue
    participant Processor as Processor

    Client->>Dispatcher: POST webhook event
    Dispatcher->>EventFilter: should_process_event(event)
    alt Event is filtered
        EventFilter-->>Dispatcher: FilterResult(should_process=false, reason)
        Dispatcher-->>Client: return filtered status
    else Event passes filter
        EventFilter-->>Dispatcher: FilterResult(should_process=true)
        Dispatcher->>Queue: enqueue event
        Queue->>Processor: deliver event
        Processor->>Processor: enrich & evaluate rules
        Processor-->>Client: return processing result
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 I nibble through the webhook hay,

Archived, draft, and deleted kept away.
Branches folded, noisy traffic eased,
Only tidy events hop forth—pleased. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'fix : Team Usage underreporting by using aggregated endpoint' does not relate to the changeset, which focuses on event filtering to prevent rule evaluation on irrelevant GitHub events. Update the PR title to accurately reflect the main change, such as 'fix: add event filtering to prevent rule evaluation on irrelevant GitHub events'.
Docstring Coverage ⚠️ Warning Docstring coverage is 10.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed All objectives from issue #25 are met: event filtering in centralized utility, dispatcher pre-filtering, PR state validation, push event filtering, repository archival checks, and comprehensive tests added.
Out of Scope Changes check ✅ Passed All changes are directly related to event filtering implementation and validation required by issue #25; no out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 99.34641% with 1 line in your changes missing coverage. Please review.

❌ Your project status has failed because the head coverage (67.5%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@           Coverage Diff           @@
##            main     #50     +/-   ##
=======================================
+ Coverage   67.0%   67.5%   +0.5%     
=======================================
  Files        151     153      +2     
  Lines       9384    9533    +149     
=======================================
+ Hits        6290    6443    +153     
+ Misses      3094    3090      -4     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8302582...e496259. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
src/event_processors/push.py (2)

40-47: Good early exit, but the zero-SHA string is hardcoded (duplicated).

Import NULL_SHA from src.core.utils.event_filter instead of repeating the magic string. This was already flagged in the event filter module review.

♻️ Suggested change
+from src.core.utils.event_filter import NULL_SHA
+
 ...
 
-        if payload.get("deleted") or not payload.get("after") or payload.get("after") == "0000000000000000000000000000000000000000":
+        if payload.get("deleted") or not payload.get("after") or payload.get("after") == NULL_SHA:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/push.py` around lines 40 - 47, Replace the hardcoded
zero-SHA string in the early-exit check inside push.py with the shared constant
NULL_SHA from src.core.utils.event_filter: import NULL_SHA at top, then use
NULL_SHA in the condition that currently compares payload.get("after") ==
"0000000000000000000000000000000000000000". Keep the same
logger.info("push_skipped_deleted_or_empty") and return of ProcessingResult
unchanged.

117-119: Unreachable null-SHA branch — dead code after the early exit at line 40.

The early return on line 40 already catches payload.get("after") == "000…0" and not payload.get("after"), so by the time execution reaches line 117, sha (which is payload.get("after")) can never be falsy or equal to the null SHA. This check is now dead code. Consider removing it or, if you want to keep it for safety, add a comment explaining it's a defensive guard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/push.py` around lines 117 - 119, The check of sha =
payload.get("after") followed by if not sha or sha == "0000...0" in push.py is
dead code because an earlier guard already returns when payload.get("after") is
falsy or the null-SHA; remove the redundant conditional and its logger.warning
to clean up the code, or if you prefer to keep it as a defensive guard, replace
the conditional with a single-line comment noting it's defensive and leave the
code path unreachable in normal flow; reference the variable sha and
payload.get("after") when making the change so you update the correct block.
src/core/utils/event_filter.py (3)

50-58: Add full type annotations to dict parameters.

The payload parameter uses bare dict without type arguments across all internal functions. Per coding guidelines, use dict[str, Any].

♻️ Suggested change
-def _apply_filters(event_type: EventType, payload: dict) -> FilterResult:
+def _apply_filters(event_type: EventType, payload: dict[str, Any]) -> FilterResult:

Apply the same to _filter_pull_request, _filter_push, and _is_repo_archived.

As per coding guidelines: **/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)

Also applies to: 61-77, 80-88, 91-93

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/utils/event_filter.py` around lines 50 - 58, Update all bare dict
parameter annotations to use dict[str, Any] and add the necessary import of Any:
change the signature of _apply_filters(event_type: EventType, payload: dict) ->
FilterResult to payload: dict[str, Any], and do the same for
_filter_pull_request, _filter_push, and _is_repo_archived (and any other
internal helpers referenced in the diff). Ensure you import Any from typing at
the top of the module and keep the return types unchanged (e.g., FilterResult)
so type checking follows the project's modern typing conventions.

21-26: Consider making FilterResult frozen for immutability.

Since FilterResult instances are never mutated after creation, frozen=True would enforce that invariant and align with the coding guidelines' preference for immutable internal state via dataclasses.

♻️ Suggested change
-@dataclass
+@dataclass(frozen=True)
 class FilterResult:
     """Result of event filter check."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/utils/event_filter.py` around lines 21 - 26, FilterResult is mutable
but never changed after creation; make it immutable by adding frozen=True to the
dataclass decorator (i.e., change `@dataclass` to `@dataclass`(frozen=True)) so
instances of FilterResult are read-only and enforce the intended invariant.

16-16: NULL_SHA is duplicated as hardcoded strings in src/event_processors/push.py (lines 40 and 118).

The push processor inline-checks the same zero SHA string instead of importing this constant. Consider importing NULL_SHA from src/core/utils/event_filter in push.py to maintain a single source of truth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/utils/event_filter.py` at line 16, Replace the duplicated hardcoded
zero SHA strings in the push processor with the single source constant: import
NULL_SHA from src.core.utils.event_filter into the push module and use NULL_SHA
wherever the literal "0000000000000000000000000000000000000000" is currently
compared (both occurrences in the push processor logic); update any conditional
checks or assignments in functions within src/event_processors/push.py that
reference the literal so they reference NULL_SHA instead.
src/event_processors/pull_request/processor.py (1)

1-1: Standard logging is used instead of structlog with structured fields.

The coding guidelines specify structured logging at boundaries with fields like operation, subject_ids, decision, latency_ms. This file uses the standard logging module throughout (pre-existing), but the new log at line 52-57 would benefit from structured logging, especially since the filter module (event_filter.py) already uses structlog. This is a pre-existing pattern in the file, so flagging as optional.

Also applies to: 14-14

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/pull_request/processor.py` at line 1, This file imports
the stdlib logging and adds a plain log where structured logging is required;
change to structlog by importing structlog and acquiring a logger via
structlog.get_logger(), then replace the plain log call (the new log around
lines 52-57 in this module) with a structured log call that includes the fields
operation, subject_ids, decision, and latency_ms (match the field names used in
event_filter.py) — e.g., logger.info(...) or logger.bind(...).info(...) with
those fields and a clear message; leave other pre-existing stdlib logging
untouched if not needed but prefer structlog at this boundary.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8302582 and 99989b5.

📒 Files selected for processing (9)
  • src/core/utils/event_filter.py
  • src/event_processors/pull_request/processor.py
  • src/event_processors/push.py
  • src/webhooks/dispatcher.py
  • src/webhooks/handlers/pull_request.py
  • tests/integration/webhooks/test_webhook_flow.py
  • tests/unit/core/test_event_filter.py
  • tests/unit/event_processors/test_pull_request_processor.py
  • tests/unit/event_processors/test_push_processor.py
💤 Files with no reviewable changes (1)
  • src/webhooks/handlers/pull_request.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/core/utils/event_filter.py
  • tests/unit/event_processors/test_pull_request_processor.py
  • tests/integration/webhooks/test_webhook_flow.py
  • src/event_processors/push.py
  • src/webhooks/dispatcher.py
  • tests/unit/core/test_event_filter.py
  • tests/unit/event_processors/test_push_processor.py
  • src/event_processors/pull_request/processor.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

  • tests/unit/event_processors/test_pull_request_processor.py
  • tests/integration/webhooks/test_webhook_flow.py
  • tests/unit/core/test_event_filter.py
  • tests/unit/event_processors/test_push_processor.py
🧠 Learnings (1)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence

Applied to files:

  • tests/integration/webhooks/test_webhook_flow.py
🧬 Code graph analysis (6)
src/core/utils/event_filter.py (1)
src/core/models.py (2)
  • EventType (91-105)
  • WebhookEvent (116-144)
tests/unit/event_processors/test_pull_request_processor.py (4)
tests/unit/event_processors/test_push_processor.py (2)
  • processor (29-38)
  • task (42-52)
src/tasks/task_queue.py (2)
  • Task (19-47)
  • installation_id (41-47)
tests/unit/event_processors/pull_request/test_enricher.py (1)
  • enricher (14-15)
src/event_processors/pull_request/enricher.py (1)
  • enrich_event_data (39-90)
tests/integration/webhooks/test_webhook_flow.py (2)
src/webhooks/dispatcher.py (2)
  • WebhookDispatcher (13-57)
  • register_handler (25-28)
src/tasks/task_queue.py (2)
  • TaskQueue (64-233)
  • start_workers (209-215)
src/event_processors/push.py (1)
src/event_processors/base.py (1)
  • ProcessingResult (16-23)
src/webhooks/dispatcher.py (2)
src/core/utils/event_filter.py (1)
  • should_process_event (29-47)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (18-19)
tests/unit/core/test_event_filter.py (2)
src/core/models.py (2)
  • EventType (91-105)
  • WebhookEvent (116-144)
src/core/utils/event_filter.py (2)
  • FilterResult (22-26)
  • should_process_event (29-47)
🪛 GitHub Actions: Run pre-commit hooks
src/event_processors/push.py

[error] 1-1: Ruff detected formatting issues; changes were applied by the hook during the run.

tests/unit/core/test_event_filter.py

[error] 1-1: Ruff detected formatting changes in test file; 2 files reformatted by ruff-format.

🔇 Additional comments (9)
src/core/utils/event_filter.py (1)

29-47: LGTM on the public API and logging.

The should_process_event function provides clean structured logging with event_type, repo, and reason fields, and the overall dispatch to internal filters is well organized.

tests/unit/event_processors/test_pull_request_processor.py (1)

59-78: LGTM — good coverage for the closed-PR skip path.

The test correctly validates the early-exit behavior: no enrichment or agent calls for closed/merged PRs, and the result is successful with no violations.

tests/unit/event_processors/test_push_processor.py (1)

89-106: LGTM — good coverage for the deleted-branch skip path.

Correctly validates that the push processor's early exit skips engine execution and check run creation for deletion events.

src/webhooks/dispatcher.py (1)

45-47: LGTM — clean pre-filter integration.

The filter is correctly placed after handler lookup (avoiding unnecessary filtering for unhandled events) and before enqueue. The return format is consistent with other dispatch outcomes.

src/event_processors/pull_request/processor.py (1)

51-63: LGTM — defense-in-depth guard for invalid PR states.

Good secondary check in case the dispatcher's pre-filter is bypassed (e.g., direct processor invocation). The early return avoids unnecessary API calls and enrichment.

tests/integration/webhooks/test_webhook_flow.py (2)

235-276: LGTM — solid integration test for filtered event suppression.

Good end-to-end validation that a deleted-branch push event is filtered at the dispatcher level and never reaches the handler. The payload is realistic and the assertion on call_count == 0 is precise.


38-57: Good payload updates to align with filter requirements.

The addition of state, merged, and draft fields to the PR payload ensures these tests exercise the real filter path rather than accidentally getting filtered out.

tests/unit/core/test_event_filter.py (2)

1-8: Unused pytest import.

pytest is imported but not used in any test (no parametrize, fixtures, marks, or raises). However, per review guidelines, I won't flag unused imports unless indicated by static analysis. The pipeline failure mentions Ruff formatting issues which may address this.


11-134: Good test coverage across all filter paths.

The tests thoroughly cover PR actions (opened/synchronize/reopened pass; closed filtered), PR states (merged, draft), push scenarios (valid, deleted, null SHA, empty after), archived repos, and passthrough for other event types. The _make_event helper keeps tests concise.

One minor gap: there's no test for a push event where the after key is entirely absent from the payload (as opposed to empty string). This is a subtle edge case that's handled by the code (payload.get("after") returns None, and not None is True), but an explicit test would document the behavior.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/core/utils/event_filter.py`:
- Around line 50-58: Update all bare dict parameter annotations to use dict[str,
Any] and add the necessary import of Any: change the signature of
_apply_filters(event_type: EventType, payload: dict) -> FilterResult to payload:
dict[str, Any], and do the same for _filter_pull_request, _filter_push, and
_is_repo_archived (and any other internal helpers referenced in the diff).
Ensure you import Any from typing at the top of the module and keep the return
types unchanged (e.g., FilterResult) so type checking follows the project's
modern typing conventions.
- Around line 21-26: FilterResult is mutable but never changed after creation;
make it immutable by adding frozen=True to the dataclass decorator (i.e., change
`@dataclass` to `@dataclass`(frozen=True)) so instances of FilterResult are
read-only and enforce the intended invariant.
- Line 16: Replace the duplicated hardcoded zero SHA strings in the push
processor with the single source constant: import NULL_SHA from
src.core.utils.event_filter into the push module and use NULL_SHA wherever the
literal "0000000000000000000000000000000000000000" is currently compared (both
occurrences in the push processor logic); update any conditional checks or
assignments in functions within src/event_processors/push.py that reference the
literal so they reference NULL_SHA instead.

In `@src/event_processors/pull_request/processor.py`:
- Line 1: This file imports the stdlib logging and adds a plain log where
structured logging is required; change to structlog by importing structlog and
acquiring a logger via structlog.get_logger(), then replace the plain log call
(the new log around lines 52-57 in this module) with a structured log call that
includes the fields operation, subject_ids, decision, and latency_ms (match the
field names used in event_filter.py) — e.g., logger.info(...) or
logger.bind(...).info(...) with those fields and a clear message; leave other
pre-existing stdlib logging untouched if not needed but prefer structlog at this
boundary.

In `@src/event_processors/push.py`:
- Around line 40-47: Replace the hardcoded zero-SHA string in the early-exit
check inside push.py with the shared constant NULL_SHA from
src.core.utils.event_filter: import NULL_SHA at top, then use NULL_SHA in the
condition that currently compares payload.get("after") ==
"0000000000000000000000000000000000000000". Keep the same
logger.info("push_skipped_deleted_or_empty") and return of ProcessingResult
unchanged.
- Around line 117-119: The check of sha = payload.get("after") followed by if
not sha or sha == "0000...0" in push.py is dead code because an earlier guard
already returns when payload.get("after") is falsy or the null-SHA; remove the
redundant conditional and its logger.warning to clean up the code, or if you
prefer to keep it as a defensive guard, replace the conditional with a
single-line comment noting it's defensive and leave the code path unreachable in
normal flow; reference the variable sha and payload.get("after") when making the
change so you update the correct block.

Copy link
Member

@dkargatzis dkargatzis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look very useful for optimizing event handling! However, the PR title is misleading and there are some opportunities to reduce duplication and improve typing.

return result


def _apply_filters(event_type: EventType, payload: dict) -> FilterResult:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing: Please update dict to dict[str, Any] for all payload parameters (here and in helper functions) to align with project standards.

return ProcessingResult(
success=True,
violations=[],
api_calls_made=0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY / Constant: Please import NULL_SHA from src.core.utils.event_filter instead of hardcoding the "0000..." string here. This ensures consistency if we ever need to update how null SHAs are handled.

PR_ACTIONS_PROCESS = frozenset({"opened", "synchronize", "reopened"})


@dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Make FilterResult immutable by adding frozen=True to the @dataclass decorator. This prevents accidental modification of the filter decision downstream.

@naaa760 naaa760 changed the title fix : Team Usage underreporting by using aggregated endpoint feat: filter irrelevant gitHub events before rule evaluation Feb 23, 2026
@naaa760
Copy link
Contributor Author

naaa760 commented Feb 23, 2026

@dkargatzis
sir all feedback addressed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/event_processors/push.py (1)

119-120: Hardcoded null SHA still present despite importing NULL_SHA

Line 7 imports NULL_SHA and the new early-exit guard (line 41) uses it, but this pre-existing check at line 119 still embeds the literal 40-zero string. This was the root of the past review comment and should be updated to complete the fix.

♻️ Use the constant
-        if not sha or sha == "0000000000000000000000000000000000000000":
+        if not sha or sha == NULL_SHA:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/push.py` around lines 119 - 120, Replace the hardcoded
40-zero SHA literal in the early-exit guard with the imported NULL_SHA constant:
in src/event_processors/push.py update the conditional that currently reads if
not sha or sha == "0000000000000000000000000000000000000000": to use NULL_SHA
(i.e., if not sha or sha == NULL_SHA) and keep the existing logger.warning call
(logger.warning("No valid commit SHA found, skipping check run")) unchanged;
this ensures consistency with the earlier guard that already references
NULL_SHA.
src/core/utils/event_filter.py (2)

72-73: merged check is unreachable dead code

GitHub's API guarantees that a merged PR has state == "closed" and merged == True. Any PR that reaches line 72 has already passed the state != "open" guard on line 69, so merged can never be True at that point. The check is a no-op and adds confusion about invariants.

♻️ Remove unreachable check
     if state != "open":
         return FilterResult(should_process=False, reason=f"PR state '{state}' not open")
 
-    if pr.get("merged"):
-        return FilterResult(should_process=False, reason="PR already merged")
-
     if pr.get("draft"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/utils/event_filter.py` around lines 72 - 73, The pr.get("merged")
branch in event_filter.py is unreachable because any merged PR will already
satisfy state == "closed" and has passed the earlier state != "open" guard, so
remove the if pr.get("merged") -> return FilterResult(...) block; update any
comments/tests that reference this redundant merged check and keep the existing
state-based filtering logic (the FilterResult usage and surrounding function in
event_filter.py should remain unchanged).

44-44: Unnecessary hasattr guard — EventType is always an Enum

event_type is typed as EventType, which extends both str and Enum, so .value always exists. The ternary is dead defensive code.

♻️ Simplify
-            event_type=event_type.value if hasattr(event_type, "value") else str(event_type),
+            event_type=event_type.value,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/utils/event_filter.py` at line 44, The ternary using
hasattr(event_type, "value") is unnecessary because EventType is an Enum with a
.value; replace the expression event_type=event_type.value if
hasattr(event_type, "value") else str(event_type) with a direct use of
event_type.value (i.e., always use event_type.value) in the code that constructs
the event (referencing the event_type variable / EventType enum in
event_filter.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/utils/event_filter.py`:
- Around line 42-47: The structured log call for "event_filtered" is missing
required fields; update the logger.info invocation (the one using event_type,
event.repo_full_name and result.reason in event_filter.py) to include operation
(e.g., "event_filter" or event_type), subject_ids (pull from event.subject_ids
or event.get_subject_ids()), decision (use result.decision or
result.allowed/blocked), and latency_ms (capture a start timestamp before
filtering and compute int((time.time() - start_ts) * 1000) after; import time if
needed); keep the existing repo and reason fields and add these four keys to the
structured payload.

In `@src/event_processors/push.py`:
- Around line 41-48: The info log emitted when skipping deleted/empty pushes is
unstructured; update the logger.info call inside the push skip branch (the block
guarded by payload.get("deleted") or not payload.get("after") or
payload.get("after") == NULL_SHA) to emit structured fields including operation
(e.g., "push.process"), subject_ids (derive from payload — e.g.,
payload.get("repository", {}).get("id") or payload.get("ref")/commit id),
decision ("skipped_deleted_or_empty"), and latency_ms (use the computed
int((time.time() - start_time) * 1000) value already used for
ProcessingResult.processing_time_ms). Keep the ProcessingResult return as-is but
ensure the log fields match those names exactly so callers can correlate logs
and the returned processing_time_ms.

---

Nitpick comments:
In `@src/core/utils/event_filter.py`:
- Around line 72-73: The pr.get("merged") branch in event_filter.py is
unreachable because any merged PR will already satisfy state == "closed" and has
passed the earlier state != "open" guard, so remove the if pr.get("merged") ->
return FilterResult(...) block; update any comments/tests that reference this
redundant merged check and keep the existing state-based filtering logic (the
FilterResult usage and surrounding function in event_filter.py should remain
unchanged).
- Line 44: The ternary using hasattr(event_type, "value") is unnecessary because
EventType is an Enum with a .value; replace the expression
event_type=event_type.value if hasattr(event_type, "value") else str(event_type)
with a direct use of event_type.value (i.e., always use event_type.value) in the
code that constructs the event (referencing the event_type variable / EventType
enum in event_filter.py).

In `@src/event_processors/push.py`:
- Around line 119-120: Replace the hardcoded 40-zero SHA literal in the
early-exit guard with the imported NULL_SHA constant: in
src/event_processors/push.py update the conditional that currently reads if not
sha or sha == "0000000000000000000000000000000000000000": to use NULL_SHA (i.e.,
if not sha or sha == NULL_SHA) and keep the existing logger.warning call
(logger.warning("No valid commit SHA found, skipping check run")) unchanged;
this ensures consistency with the earlier guard that already references
NULL_SHA.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99989b5 and e496259.

📒 Files selected for processing (3)
  • src/core/utils/event_filter.py
  • src/event_processors/push.py
  • tests/unit/core/test_event_filter.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/core/test_event_filter.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/core/utils/event_filter.py
  • src/event_processors/push.py
🧠 Learnings (2)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Use modern typing only: `dict[str, Any]`, `list[str]`, `str | None` (no `Dict`, `List`, `Optional`)

Applied to files:

  • src/core/utils/event_filter.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Reject old typing syntax (`Dict`, `List`, `Optional`) in code review

Applied to files:

  • src/core/utils/event_filter.py
🧬 Code graph analysis (1)
src/core/utils/event_filter.py (1)
src/core/models.py (2)
  • EventType (91-105)
  • WebhookEvent (116-144)

Comment on lines +42 to +47
logger.info(
"event_filtered",
event_type=event_type.value if hasattr(event_type, "value") else str(event_type),
repo=event.repo_full_name,
reason=result.reason,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Structured log at filter boundary is missing required fields per guidelines

The coding guidelines require structured log entries at boundaries to carry operation, subject_ids, decision, and latency_ms. The current log omits all four.

♻️ Proposed fix
-        logger.info(
-            "event_filtered",
-            event_type=event_type.value if hasattr(event_type, "value") else str(event_type),
-            repo=event.repo_full_name,
-            reason=result.reason,
-        )
+        logger.info(
+            "event_filtered",
+            operation="should_process_event",
+            subject_ids={"repo": event.repo_full_name, "delivery_id": event.delivery_id},
+            decision="skip",
+            event_type=event_type.value,
+            reason=result.reason,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/utils/event_filter.py` around lines 42 - 47, The structured log call
for "event_filtered" is missing required fields; update the logger.info
invocation (the one using event_type, event.repo_full_name and result.reason in
event_filter.py) to include operation (e.g., "event_filter" or event_type),
subject_ids (pull from event.subject_ids or event.get_subject_ids()), decision
(use result.decision or result.allowed/blocked), and latency_ms (capture a start
timestamp before filtering and compute int((time.time() - start_ts) * 1000)
after; import time if needed); keep the existing repo and reason fields and add
these four keys to the structured payload.

Comment on lines +41 to +48
if payload.get("deleted") or not payload.get("after") or payload.get("after") == NULL_SHA:
logger.info("push_skipped_deleted_or_empty")
return ProcessingResult(
success=True,
violations=[],
api_calls_made=0,
processing_time_ms=int((time.time() - start_time) * 1000),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unstructured log for the new skip path; missing required guideline fields

logger.info("push_skipped_deleted_or_empty") is a bare string with no structured fields. Guidelines require operation, subject_ids, decision, and latency_ms at processing boundaries.

♻️ Add structured fields
-            logger.info("push_skipped_deleted_or_empty")
+            logger.info(
+                "push_skipped_deleted_or_empty",
+                operation="process_push",
+                subject_ids={"repo": task.repo_full_name, "ref": ref},
+                decision="skip",
+                latency_ms=int((time.time() - start_time) * 1000),
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/push.py` around lines 41 - 48, The info log emitted when
skipping deleted/empty pushes is unstructured; update the logger.info call
inside the push skip branch (the block guarded by payload.get("deleted") or not
payload.get("after") or payload.get("after") == NULL_SHA) to emit structured
fields including operation (e.g., "push.process"), subject_ids (derive from
payload — e.g., payload.get("repository", {}).get("id") or
payload.get("ref")/commit id), decision ("skipped_deleted_or_empty"), and
latency_ms (use the computed int((time.time() - start_time) * 1000) value
already used for ProcessingResult.processing_time_ms). Keep the ProcessingResult
return as-is but ensure the log fields match those names exactly so callers can
correlate logs and the returned processing_time_ms.

@dkargatzis dkargatzis merged commit 267e19f into warestack:main Feb 26, 2026
3 checks passed
dkargatzis added a commit that referenced this pull request Feb 26, 2026
- Add centralized event_filter.py module for filtering irrelevant events
  - Filter archived repositories, closed/merged/draft PRs, deleted branches
  - Add FilterResult dataclass (frozen=True for immutability)
  - Add NULL_SHA constant for consistent null SHA handling
- Migrate push.py and pull_request.py from stdlib logging to structlog
- Add event filtering at dispatcher level before handler dispatch
- Add delivery_id parameter to WebhookEvent for better tracing
- Add comprehensive unit tests for event filtering logic

Closes #50

Signed-off-by: Dimitris Kargatzis <dkargatzis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: add event filtering to prevent rule evaluation on irrelevant GitHub events

3 participants