Customisable table references for reports #8

kingo55 · 2020-07-16T07:26:45Z

I think we can make reports more customisable by changing the way we reference tables in our knits / Mojito.

Currently tables are referenced based on the client ID and subject types defined in wave_params. This makes for a tidy wave_params object:

wave_params <- list(
  client_id="client",
  wave_id="w143",
  start_date="2020-05-25 11:33:00",
  stop_date="2020-06-13 14:42:22",
  time_grain="days",
  subject="usercookie",
  recipes=c("Control","Treatment")
)

This yields tables like so:

mojito.exposures_usercookie
mojito.segments_usercookie
mojito.client_conversions_usercookie

There are some issues with this though:

It is quite rigid because users aren't able to deviate their schema naming or table naming conventions to fit their data warehouses.
It's also inefficient in Redshift where goals that would normally be defined once-off, inside a report, need to be committed to our datamodelling steps (users can't just define a custom goal table for a goal e.g. (SELECT domain_userid as subject, 'conversion' as goal, 10.00 as revenue, derived_tstamp as conversion_time FROM client.events WHERE event_name = 'custom_schema')).
Another inefficiency is requiring users to specify a client ID. Not all users will be multi-tenanted and the additional column uses extra space in the DWH, whilst not strictly needed.

Whilst slightly uglier, I think we can make Mojito easier-to-adopt through customisable table references, like so:

wave_params <- list(
  wave_id="w143",
  start_date="2020-05-25 11:33:00",
  stop_date="2020-06-13 14:42:22",
  time_grain="days",
  tables=list(
    exposure="mojito.exposures_usercookie",
    goal="mojito.client_conversions_usercookie",
    segment="mojito.segments_usercookie",
    failure="mojito.recipe_errors_2"),
  recipes=c("Control","Treatment")
)

Thoughts @dapperdrop ?

The text was updated successfully, but these errors were encountered:

dapperdrop · 2020-07-16T22:08:33Z

@kingo55

I think this is fine. It should be minimal impact for us as well because the table references won't change around often.

With the example you gave, how would we adopt it for our use case (multi-tenanted DWH)?

kingo55 · 2020-07-17T06:08:14Z

@dapperdrop - we can create a multi-tenanted exposures table, like we currently do. Where it has the following columns:

client_id, subject, wave_id, recipe, exposure_time

And derive views from that, like:

(SELECT subject, wave_id, recipe, exposure_time FROM all_exposures WHERE client_id = 'client')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customisable table references for reports #8

Customisable table references for reports #8

kingo55 commented Jul 16, 2020

dapperdrop commented Jul 16, 2020

kingo55 commented Jul 17, 2020

Customisable table references for reports #8

Customisable table references for reports #8

Comments

kingo55 commented Jul 16, 2020

dapperdrop commented Jul 16, 2020

kingo55 commented Jul 17, 2020