Skip to content

Introduce is_human_interaction Dimension Across MVs/RMVs and Dashboards (Phased Rollout) #34698

@erickgonzalez

Description

@erickgonzalez

Description

Recent testing confirmed that crawlers and bots are generating events (pageviews and content impressions), which are currently included in our dashboards. This inflates metrics and impacts data accuracy.

We need to introduce a consistent mechanism to identify and filter human interaction across our analytics pipeline.

Scope of this task:

  • Add a new is_human_interaction column to the relevant MV or RMV tables.
  • Update MV/RMV logic to populate this column using a unified detection approach.
  • Update Cubes to expose is_human_interaction as a dimension.
  • Ensure UI dashboards only display metrics where is_human_interaction = true.
  • Define and execute a safe refresh/backfill plan for existing data.
  • Implement changes incrementally:
    • Phase 1: Pageviews
    • Phase 2: Conversions
    • Phase 3: Engagement
  • Reuse shared logic (functions, views, or reusable SQL patterns) so that bot detection rules are defined in a single place and not duplicated across MVs/RMVs.

The solution must avoid regressions and maintain compatibility with current dashboards and replicated/non-replicated environments.

Acceptance Criteria

  • A new is_human_interaction column is added to the relevant MV/RMV tables.
  • A single reusable logic definition (e.g., SQL function, view, or shared expression) is created for bot/human classification. If possible
  • All updated MVs/RMVs reference the shared classification logic (no duplicated logic across tables). If possible
  • Cubes are updated to include is_human_interaction as a dimension.
  • A documented migration plan exists to refresh/backfill MV/RMV data safely.
  • MV/RMV refresh is executed without data loss.
  • No regression is introduced in existing dashboards.
  • Changes are implemented incrementally (Pageviews → Conversions → Engagement).
  • Performance impact is validated and remains stable or improved.

Priority

High

Additional Context

https://gist.github.com/erickgonzalez/0dd76b1c0c37112834978857d0c45db2

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Next Sprint

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions