Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowplow_tests_view_in_session_values scans all the data every time the test is run #56

Closed
juhani-sc opened this issue Jun 13, 2024 · 1 comment
Labels
status:has_pr A PR exists for this issue. type:enhancement New features or improvements to existing features.

Comments

@juhani-sc
Copy link

juhani-sc commented Jun 13, 2024

Is your feature request related to a problem? Please describe.

The test snowplow_tests_view_in_session_values doesn't include any filters. This means that as the data the test is referencing grows in size the test takes longer and longer to execute and requires more compute resources.

Describe the solution you'd like

One simple solution would be to limit the test to only look at past n-days worth of data or some other fixed time window:

with prep as (
  select
    session_identifier,
    count(distinct views_in_session) as dist_pvis_values,
    count(*) - count(distinct view_in_session_index)  as all_minus_dist_pvisi,
    count(*) - count(distinct view_id) as all_minus_dist_pvids

  from {{ ref('snowplow_unified_views') }}
  -- as a simple example
  where date(dvce_created_tstamp) >= current_date() - 2
  group by 1
)

select
  session_identifier

from prep

where dist_pvis_values != 1
or all_minus_dist_pvisi != 0
or all_minus_dist_pvids != 0

Describe alternatives you've considered

Haven't thought of other solutions.

Additional context

Not relevant.

Are you interested in contributing towards this feature?

Sure. If we go with the solution I suggested, it's a relatively minor fix.

@juhani-sc juhani-sc added the type:enhancement New features or improvements to existing features. label Jun 13, 2024
@github-actions github-actions bot added the status:needs_triage Needs maintainer triage. label Jun 13, 2024
@rlh1994 rlh1994 added status:has_pr A PR exists for this issue. and removed status:needs_triage Needs maintainer triage. labels Jun 24, 2024
@rlh1994
Copy link
Contributor

rlh1994 commented Jun 24, 2024

Thanks @juhani-sc, we've agreed to swap this to run on the page views this run table instead which should greatly reduce the scan volume while still identifying any issues that may make it into the table.

jedichien pushed a commit to viki-org/dbt-snowplow-unified that referenced this issue Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:has_pr A PR exists for this issue. type:enhancement New features or improvements to existing features.
Projects
None yet
Development

No branches or pull requests

2 participants