-
Notifications
You must be signed in to change notification settings - Fork 479
Open
Open
Copy link
Labels
Description
Description
Recent testing confirmed that crawlers and bots are generating events (pageviews and content impressions), which are currently included in our dashboards. This inflates metrics and impacts data accuracy.
We need to introduce a consistent mechanism to identify and filter human interaction across our analytics pipeline.
Scope of this task:
- Add a new is_human_interaction column to the relevant MV or RMV tables.
- Update MV/RMV logic to populate this column using a unified detection approach.
- Update Cubes to expose is_human_interaction as a dimension.
- Ensure UI dashboards only display metrics where is_human_interaction = true.
- Define and execute a safe refresh/backfill plan for existing data.
- Implement changes incrementally:
- Phase 1: Pageviews
- Phase 2: Conversions
- Phase 3: Engagement
- Reuse shared logic (functions, views, or reusable SQL patterns) so that bot detection rules are defined in a single place and not duplicated across MVs/RMVs.
The solution must avoid regressions and maintain compatibility with current dashboards and replicated/non-replicated environments.
Acceptance Criteria
- A new is_human_interaction column is added to the relevant MV/RMV tables.
- A single reusable logic definition (e.g., SQL function, view, or shared expression) is created for bot/human classification. If possible
- All updated MVs/RMVs reference the shared classification logic (no duplicated logic across tables). If possible
- Cubes are updated to include is_human_interaction as a dimension.
- A documented migration plan exists to refresh/backfill MV/RMV data safely.
- MV/RMV refresh is executed without data loss.
- No regression is introduced in existing dashboards.
- Changes are implemented incrementally (Pageviews → Conversions → Engagement).
- Performance impact is validated and remains stable or improved.
Priority
High
Additional Context
https://gist.github.com/erickgonzalez/0dd76b1c0c37112834978857d0c45db2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Next Sprint