Discussion about using current example dataset to generate cohort query #84

Zrealshadow · 2022-08-14T08:45:35Z

We want to generate cohort query from sogamo dataset for cohortQueryProcessing unittest.
Through some simple data analysis, there some problems. we found that:

In sogamo dataset, there are only 4 players in the entire dataset which contains 10k items. Thus the cohort query in old-version code is not representative. It can not work well as a unittest. According to the CoHANA paper, the raw data is larger than the sample data current we have. I recommend use raw data to generate test cohort query.

In tpch dataset, there is a same problem. There is only 1 user in the entire dataset. Total order in this datasets is about the same user.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion about using current example dataset to generate cohort query #84

Discussion about using current example dataset to generate cohort query #84

Zrealshadow commented Aug 14, 2022 •

edited

Loading

Discussion about using current example dataset to generate cohort query #84

Discussion about using current example dataset to generate cohort query #84

Comments

Zrealshadow commented Aug 14, 2022 • edited Loading

Zrealshadow commented Aug 14, 2022 •

edited

Loading