Refactor API / Wilcox test #33

grst · 2024-02-22T10:08:53Z

Add minimal test dataset
Reorganize code
new base class for generic tests, linear model base is subclass of it
new interface for simple tests
(draft) implementation of wilcoxon test
Add tests

for more information, see https://pre-commit.ci

grst

@Zethson, @ilan-gold Here are my suggestions on how to restructure the API. LMK what you think!

I don't know how much I can contribute implementation-wise from next week on (starting to work regularly again), but I'll always be around for discussion and code review.

src/multi_condition_comparisions/methods/_base.py

src/multi_condition_comparisions/methods/_edger.py

src/multi_condition_comparisions/methods/_simple_tests.py

tests/conftest.py

Zethson

Sorry for taking so long!

Absolutely think that this is going into a better direction. I was internally debating whether we should offer any common interface at all given the discrepancies of the methods.
Probably yes still..

src/multi_condition_comparisions/methods/_base.py

tests/conftest.py

Co-authored-by: Lukas Heumos <[email protected]>

codecov-commenter · 2024-03-10T19:53:31Z

Codecov Report

Attention: Patch coverage is 74.91409% with 73 lines in your changes are missing coverage. Please review.

Project coverage is 52.94%. Comparing base (13a8f0e) to head (91582e9).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
- Coverage   55.68%   52.94%   -2.75%     
==========================================
  Files           6       12       +6     
  Lines         352      442      +90     
==========================================
+ Hits          196      234      +38     
- Misses        156      208      +52

Files	Coverage Δ
src/multi_condition_comparisions/__init__.py	`100.00% <100.00%> (ø)`
src/multi_condition_comparisions/_util.py	`100.00% <100.00%> (ø)`
...c/multi_condition_comparisions/methods/__init__.py	`100.00% <100.00%> (ø)`
...lti_condition_comparisions/methods/_statsmodels.py	`100.00% <100.00%> (ø)`
src/multi_condition_comparisions/pl/__init__.py	`100.00% <100.00%> (ø)`
src/multi_condition_comparisions/tl/__init__.py	`100.00% <100.00%> (ø)`
src/multi_condition_comparisions/tl/de.py	`100.00% <100.00%> (+11.29%)`	⬆️
.../multi_condition_comparisions/methods/_pydeseq2.py	`96.42% <96.42%> (ø)`
src/multi_condition_comparisions/methods/_edger.py	`85.93% <85.93%> (ø)`
src/multi_condition_comparisions/methods/_base.py	`84.21% <84.21%> (ø)`
... and 1 more

ilan-gold · 2024-03-11T10:36:08Z

@Zethson Are you saying you want to keep run_de and just use that instead of compare_groups as the "public facing API?" I would lean more towards the class-based implementation as the main public-facing API - after all, that's what scanpy is moving towards as well.

Zethson · 2024-03-11T10:52:33Z

This is also what I was trying to say above @ilan-gold . I also considered only offering the classes as the single API option which would make our life a bit easier and it might be a good idea to only have one way to do things. I'm like 50/50 but if you lean more towards that, I'd totally support it

ilan-gold · 2024-03-11T14:35:45Z

I removed the wrapper and added a "correctness" test where we draw from two very different negative binomial distributions and then test them for different groups. This seems to be failing for reasons I don't understand...maybe a bug but the fact that only PyDESeq2 seems to be failing indicates this might not be the case...

I will try to keep digging a bit more

ilan-gold · 2024-03-11T14:46:08Z

Oh well, apparently there are other issues

grst · 2024-03-11T14:42:43Z

src/multi_condition_comparisions/methods/_simple_tests.py

+            obs_df = obs_df.sort_values(paired_by)
+        for group_to_compare in groups_to_compare:
+            comparison_idx = np.where(obs_df[column] == group_to_compare)[0]
+            if baseline is None:
+                baseline_idx = np.where(obs_df[column] != group_to_compare)[0]
+            else:
+                baseline_idx = np.where(obs_df[column] == baseline)[0]
+            res_dfs.append(
+                model._compare_single_group(baseline_idx, comparison_idx).assign(
+                    comparison=f"{group_to_compare}_vs_{baseline if baseline is not None else 'rest'}"
+                )
+            )


@ilan-gold, I'm not entirly sure this works for the paired test. The indices generated here are valid for the sorted df, but the _compare_single_group function only has the unsorted AnnData -- so I don't think the indices are referring to the correct observations. Or am I missing something?

src/multi_condition_comparisions/methods/_base.py

grst · 2024-03-11T14:47:54Z

src/multi_condition_comparisions/methods/_base.py

+        if fit_kwargs is None:
+            fit_kwargs = {}
+        if paired_by is not None:
+            warnings.warn("Cannot use `paired_by` with linear tests.  Ignoring paramere", UserWarning, stacklevel=2)


You can, just use f"~{column} + {paired_by}" as the formula.

Do we want to allow this? Wouldn't this be testing groups of 2 against each other?

A paired t-test is a special case of a linear model with groups of two and f"~{column} + {paired_by}".

ilan-gold · 2024-03-11T16:46:55Z

@Zethson @grst Outstanding issues:

Paired EdgeR correctness testing is not working. Here's the commit with the fixture: bc6d8ae. What's failing is a false positive (i.e., low p-value) result of a gene whose expression across the two groups, A and B, comes from identical mixtures of two negative binomial distributions. These are then paired in order down the dataframe.
Both pydeseq2 methods fail correctness testing, and the resulting dataframe looks completely wrong (reversed i.e., gene that should have lower p-value has higher). I have no idea why this is happening. I have looked at this a bunch and the fact that the unpaired test is not even passing (unlike all other methods) makes me think something else is going on here.

Maybe a paired coding session to walk through it tomorrow morning @Zethson?

grst · 2024-03-12T18:04:29Z

maybe even just merge this and follow up on the fixes in separate PRs? Easier to review smaller units of change and this PR was mostly about the restructuring. If PyDESeq2 is broken, it's probably not the fault of this PR.

ilan-gold · 2024-03-12T18:13:53Z

@grst 100% agreed. I do think that whatever is broken here is probably not a bug from us (although I have been known to be wrong in the past!). I've already reported to the PyDESeq2 people

grst and others added 14 commits February 22, 2024 11:03

Add minimal test dataset

f233f35

Remove unused pp

ebc5933

Stub refactor

9d6ec1e

Move code around

9ad8d89

Fix pre-commit

6ee138c

Move check counts function to utils

2091af3

Update API for compare_groups

ef166ad

Merge remote-tracking branch 'origin/main' into grst/refactor

be7713e

[pre-commit.ci] auto fixes from pre-commit.com hooks

a6798d9

for more information, see https://pre-commit.ci

Rename util function

284af4f

Update API for simple tests

5b20d75

Update base class

26ae4dc

(Somewhat) implement wilcoxon test

be083b4

Cleanup

dc930d7

grst marked this pull request as ready for review February 23, 2024 08:38

grst requested review from Zethson and ilan-gold and removed request for Zethson February 23, 2024 08:39

grst commented Feb 23, 2024

View reviewed changes

grst changed the title ~~Refactor API~~ Refactor API / Wilcox test Feb 23, 2024

grst mentioned this pull request Feb 27, 2024

Solve model.cond with custom materializer #36

Merged

11 tasks

(chore): get tests running

c133879

Zethson reviewed Mar 8, 2024

View reviewed changes

src/multi_condition_comparisions/methods/_base.py Outdated Show resolved Hide resolved

src/multi_condition_comparisions/methods/_base.py Outdated Show resolved Hide resolved

Zethson reviewed Mar 8, 2024

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

grst and others added 2 commits March 10, 2024 20:49

Update tests/conftest.py

24e4f51

Co-authored-by: Lukas Heumos <[email protected]>

Update src/multi_condition_comparisions/methods/_base.py

91582e9

Co-authored-by: Lukas Heumos <[email protected]>

ilan-gold added 3 commits March 11, 2024 12:03

(refactor): finish API for wilcoxon test

c90d93c

(chore): remove wrapper

8661907

(feat): unified method

cd7404c

grst commented Mar 11, 2024

View reviewed changes

ilan-gold added 6 commits March 11, 2024 15:59

(fix): spelling

70597b8

(fix): paired_by needs to be in sorted object

a1b30bd

(feat): use paired_by

76c50c1

(fix): make tests deterministic

b11d524

(feat): add pairing to adata fixture

bc6d8ae

(feat): paired testing

fe6bd3e

grst merged commit 6e06241 into main Mar 13, 2024
2 of 5 checks passed

grst deleted the grst/refactor branch March 13, 2024 06:42

grst mentioned this pull request Mar 13, 2024

(feat): wilcoxon test #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor API / Wilcox test #33

Refactor API / Wilcox test #33

grst commented Feb 22, 2024 •

edited by ilan-gold

Loading

grst left a comment

Zethson left a comment

codecov-commenter commented Mar 10, 2024 •

edited

Loading

ilan-gold commented Mar 11, 2024

Zethson commented Mar 11, 2024

ilan-gold commented Mar 11, 2024 •

edited

Loading

ilan-gold commented Mar 11, 2024

grst Mar 11, 2024

grst Mar 11, 2024

ilan-gold Mar 11, 2024

grst Mar 11, 2024

ilan-gold commented Mar 11, 2024 •

edited

Loading

grst commented Mar 12, 2024

ilan-gold commented Mar 12, 2024

Refactor API / Wilcox test #33

Refactor API / Wilcox test #33

Conversation

grst commented Feb 22, 2024 • edited by ilan-gold Loading

grst left a comment

Choose a reason for hiding this comment

Zethson left a comment

Choose a reason for hiding this comment

codecov-commenter commented Mar 10, 2024 • edited Loading

Codecov Report

ilan-gold commented Mar 11, 2024

Zethson commented Mar 11, 2024

ilan-gold commented Mar 11, 2024 • edited Loading

ilan-gold commented Mar 11, 2024

grst Mar 11, 2024

Choose a reason for hiding this comment

grst Mar 11, 2024

Choose a reason for hiding this comment

ilan-gold Mar 11, 2024

Choose a reason for hiding this comment

grst Mar 11, 2024

Choose a reason for hiding this comment

ilan-gold commented Mar 11, 2024 • edited Loading

grst commented Mar 12, 2024

ilan-gold commented Mar 12, 2024

grst commented Feb 22, 2024 •

edited by ilan-gold

Loading

codecov-commenter commented Mar 10, 2024 •

edited

Loading

ilan-gold commented Mar 11, 2024 •

edited

Loading

ilan-gold commented Mar 11, 2024 •

edited

Loading