-
Notifications
You must be signed in to change notification settings - Fork 2
Conversation
grst
commented
Feb 22, 2024
•
edited by ilan-gold
Loading
edited by ilan-gold
- Add minimal test dataset
- Reorganize code
- new base class for generic tests, linear model base is subclass of it
- new interface for simple tests
- (draft) implementation of wilcoxon test
- Add tests
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zethson, @ilan-gold Here are my suggestions on how to restructure the API. LMK what you think!
I don't know how much I can contribute implementation-wise from next week on (starting to work regularly again), but I'll always be around for discussion and code review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long!
Absolutely think that this is going into a better direction. I was internally debating whether we should offer any common interface at all given the discrepancies of the methods.
Probably yes still..
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #33 +/- ##
==========================================
- Coverage 55.68% 52.94% -2.75%
==========================================
Files 6 12 +6
Lines 352 442 +90
==========================================
+ Hits 196 234 +38
- Misses 156 208 +52
|
@Zethson Are you saying you want to keep |
This is also what I was trying to say above @ilan-gold . I also considered only offering the classes as the single API option which would make our life a bit easier and it might be a good idea to only have one way to do things. I'm like 50/50 but if you lean more towards that, I'd totally support it |
I removed the wrapper and added a "correctness" test where we draw from two very different negative binomial distributions and then test them for different groups. This seems to be failing for reasons I don't understand...maybe a bug but the fact that only I will try to keep digging a bit more |
Oh well, apparently there are other issues |
obs_df = obs_df.sort_values(paired_by) | ||
for group_to_compare in groups_to_compare: | ||
comparison_idx = np.where(obs_df[column] == group_to_compare)[0] | ||
if baseline is None: | ||
baseline_idx = np.where(obs_df[column] != group_to_compare)[0] | ||
else: | ||
baseline_idx = np.where(obs_df[column] == baseline)[0] | ||
res_dfs.append( | ||
model._compare_single_group(baseline_idx, comparison_idx).assign( | ||
comparison=f"{group_to_compare}_vs_{baseline if baseline is not None else 'rest'}" | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilan-gold, I'm not entirly sure this works for the paired test. The indices generated here are valid for the sorted df, but the _compare_single_group
function only has the unsorted AnnData -- so I don't think the indices are referring to the correct observations. Or am I missing something?
if fit_kwargs is None: | ||
fit_kwargs = {} | ||
if paired_by is not None: | ||
warnings.warn("Cannot use `paired_by` with linear tests. Ignoring paramere", UserWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can, just use f"~{column} + {paired_by}"
as the formula.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow this? Wouldn't this be testing groups of 2 against each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A paired t-test is a special case of a linear model with groups of two and f"~{column} + {paired_by}"
.
@Zethson @grst Outstanding issues:
Maybe a paired coding session to walk through it tomorrow morning @Zethson? |
maybe even just merge this and follow up on the fixes in separate PRs? Easier to review smaller units of change and this PR was mostly about the restructuring. If PyDESeq2 is broken, it's probably not the fault of this PR. |
@grst 100% agreed. I do think that whatever is broken here is probably not a bug from us (although I have been known to be wrong in the past!). I've already reported to the PyDESeq2 people |