-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix grouping by lineage. #87
base: master
Are you sure you want to change the base?
Conversation
Probably we also could remove this And change here to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to fix the issue with all lineages being reported under "other", we need to update the get_major_lineage_prevalence
function to consider the min_date
and max_date
, if available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function get_major_lineage_prevalence
was updated to consider the min_date
and max_date
, if available.
Does this make sense, even if the ElasticSearch query is considering the min_date
and max_date
?
Context
Getting HTTP 500 error in queries as:
Solution
It was missing the column
lineage
in thedataframe
after grouping bylineage
.The solution consists in adding the
lineage
column to data frame after the grouping by.Also was considered the
group_keys
behavior changes frompandas
1.4 (1) to 1.5 (2).Suggestion
Each transformation in the dataframe could be a testable function.
References
(1) https://pandas.pydata.org/pandas-docs/version/1.4/reference/api/pandas.DataFrame.groupby.html
(2) https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.DataFrame.groupby.html