Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customisable table references for reports #8

Open
kingo55 opened this issue Jul 16, 2020 · 2 comments
Open

Customisable table references for reports #8

kingo55 opened this issue Jul 16, 2020 · 2 comments

Comments

@kingo55
Copy link
Member

kingo55 commented Jul 16, 2020

I think we can make reports more customisable by changing the way we reference tables in our knits / Mojito.

Currently tables are referenced based on the client ID and subject types defined in wave_params. This makes for a tidy wave_params object:

wave_params <- list(
  client_id="client",
  wave_id="w143",
  start_date="2020-05-25 11:33:00",
  stop_date="2020-06-13 14:42:22",
  time_grain="days",
  subject="usercookie",
  recipes=c("Control","Treatment")
)

This yields tables like so:

  • mojito.exposures_usercookie
  • mojito.segments_usercookie
  • mojito.client_conversions_usercookie

There are some issues with this though:

  1. It is quite rigid because users aren't able to deviate their schema naming or table naming conventions to fit their data warehouses.
  2. It's also inefficient in Redshift where goals that would normally be defined once-off, inside a report, need to be committed to our datamodelling steps (users can't just define a custom goal table for a goal e.g. (SELECT domain_userid as subject, 'conversion' as goal, 10.00 as revenue, derived_tstamp as conversion_time FROM client.events WHERE event_name = 'custom_schema')).
  3. Another inefficiency is requiring users to specify a client ID. Not all users will be multi-tenanted and the additional column uses extra space in the DWH, whilst not strictly needed.

Whilst slightly uglier, I think we can make Mojito easier-to-adopt through customisable table references, like so:

wave_params <- list(
  wave_id="w143",
  start_date="2020-05-25 11:33:00",
  stop_date="2020-06-13 14:42:22",
  time_grain="days",
  tables=list(
    exposure="mojito.exposures_usercookie",
    goal="mojito.client_conversions_usercookie",
    segment="mojito.segments_usercookie",
    failure="mojito.recipe_errors_2"),
  recipes=c("Control","Treatment")
)

Thoughts @dapperdrop ?

@dapperdrop
Copy link
Member

@kingo55

I think this is fine. It should be minimal impact for us as well because the table references won't change around often.

With the example you gave, how would we adopt it for our use case (multi-tenanted DWH)?

@kingo55
Copy link
Member Author

kingo55 commented Jul 17, 2020

@dapperdrop - we can create a multi-tenanted exposures table, like we currently do. Where it has the following columns:

client_id, subject, wave_id, recipe, exposure_time

And derive views from that, like:

(SELECT subject, wave_id, recipe, exposure_time FROM all_exposures WHERE client_id = 'client')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants