Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data warehouse support for experiments MVP #26332

Closed
8 tasks done
danielbachhuber opened this issue Nov 21, 2024 · 9 comments
Closed
8 tasks done

Data warehouse support for experiments MVP #26332

danielbachhuber opened this issue Nov 21, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request feature/experimentation Feature Tag: Experimentation

Comments

@danielbachhuber
Copy link
Contributor

danielbachhuber commented Nov 21, 2024

Done is:

  • Customer should be able to pick a data warehouse table when selecting a goal for an experiment.
  • The data warehouse table should be lazily-joined with the events table.
    • TBD whether the customer needs to do this on their own, or whether we should do it for them
    • Probably should validate that the events table is actually lazily-joined.
  • When picking a data warehouse table for an experiment, it's possible to use a HogQL expression for the field.
  • We've verified that the expected results are calculated for 'Total count', 'Property value (sum)', and other data warehouse metric aggregations. Any that don't work are hidden.
  • Experiments work as expected for Trends queries.
  • TBD support for funnels but potentially hide.
  • It should be possible to replicate experiment results in an insight. Experiments should link to their corresponding insights.
  • Documentation is written/updated.
@danielbachhuber
Copy link
Contributor Author

danielbachhuber commented Nov 26, 2024

Next up:

  • Update data warehouse lazy joins to support experiment ASOF LEFT JOIN
  • Verify expected results are calculated for 'Property value', etc.
  • Explore support for funnels (potentially hide)
  • Draft documentation

@danielbachhuber
Copy link
Contributor Author

Some things I didn't do in #26446 that could be improved:

  • Validate the modal fields (error if checkbox is checked but no Source Timestamp Key is specified)
  • Lock the "Joining Table Key" to "distinct_id" when the checkbox is checked
  • Warn in the experiment goal picker when the data warehouse table doesn't have a join

@danielbachhuber
Copy link
Contributor Author

Some query notes:

@danielbachhuber
Copy link
Contributor Author

Suggestion from @ivanagas:

Another idea, adding an "experiment joins" section here: posthog-git-experiments-data-warehouse-mvp-post-hog.vercel.app/docs/data-warehouse/join

@danielbachhuber
Copy link
Contributor Author

Issue: "Unable to resolve field: 'person'" when filtering out internal and test users

CleanShot 2024-12-10 at 06 34 16@2x

@danielbachhuber
Copy link
Contributor Author

Issue: "Unable to resolve field: 'person'" when filtering out internal and test users

I suppose this is because the query is trying to find person as a column on the data warehouse table, when it's actually located at event.person.

@danielbachhuber
Copy link
Contributor Author

danielbachhuber commented Dec 12, 2024

ClickHouse query finding in Metabase EU:

SELECT
    event_time,
    query_duration_ms,
    query,
    log_comment,
    read_rows,
    formatReadableSize(read_bytes) as read_size,
    result_rows,
    columns,
    query_id,
    exception,
    exception_code
FROM clusterAllReplicas(posthog, system.query_log)
WHERE
    query NOT LIKE '%query_log%'
    AND query LIKE '%ASOF%'
ORDER BY event_time desc
LIMIT 1000

@danielbachhuber
Copy link
Contributor Author

danielbachhuber commented Dec 16, 2024

Potential joins:

events: usage.userid -> events.properties.$user_id
persons: usage.userid -> person_distinct_ids.distinct_id
events: payments.distinct_id -> events.distinct_id
persons: payments.distinct_id -> person_distinct_ids.distinct_id
events: subscriptions.customer_id -> customers.id - customer.email -> persons.email - persons.id -> person_distinct_ids.distinct_id -> events.distinct_id
persons: subscriptions.customer_id -> customers.id - customer.email -> persons.email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature/experimentation Feature Tag: Experimentation
Projects
None yet
Development

No branches or pull requests

1 participant