Skip to content
This repository has been archived by the owner on May 27, 2024. It is now read-only.

Refactor API / Wilcox test #33

Merged
merged 26 commits into from
Mar 13, 2024
Merged

Refactor API / Wilcox test #33

merged 26 commits into from
Mar 13, 2024

Conversation

grst
Copy link
Contributor

@grst grst commented Feb 22, 2024

  • Add minimal test dataset
  • Reorganize code
  • new base class for generic tests, linear model base is subclass of it
  • new interface for simple tests
  • (draft) implementation of wilcoxon test
  • Add tests

@grst grst marked this pull request as ready for review February 23, 2024 08:38
@grst grst requested review from Zethson and ilan-gold and removed request for Zethson February 23, 2024 08:39
Copy link
Contributor Author

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zethson, @ilan-gold Here are my suggestions on how to restructure the API. LMK what you think!

I don't know how much I can contribute implementation-wise from next week on (starting to work regularly again), but I'll always be around for discussion and code review.

src/multi_condition_comparisions/methods/_base.py Outdated Show resolved Hide resolved
src/multi_condition_comparisions/methods/_simple_tests.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
@grst grst changed the title Refactor API Refactor API / Wilcox test Feb 23, 2024
@grst grst mentioned this pull request Feb 27, 2024
11 tasks
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long!

Absolutely think that this is going into a better direction. I was internally debating whether we should offer any common interface at all given the discrepancies of the methods.
Probably yes still..

src/multi_condition_comparisions/methods/_base.py Outdated Show resolved Hide resolved
src/multi_condition_comparisions/methods/_base.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Mar 10, 2024

Codecov Report

Attention: Patch coverage is 74.91409% with 73 lines in your changes are missing coverage. Please review.

Project coverage is 52.94%. Comparing base (13a8f0e) to head (91582e9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
- Coverage   55.68%   52.94%   -2.75%     
==========================================
  Files           6       12       +6     
  Lines         352      442      +90     
==========================================
+ Hits          196      234      +38     
- Misses        156      208      +52     
Files Coverage Δ
src/multi_condition_comparisions/__init__.py 100.00% <100.00%> (ø)
src/multi_condition_comparisions/_util.py 100.00% <100.00%> (ø)
...c/multi_condition_comparisions/methods/__init__.py 100.00% <100.00%> (ø)
...lti_condition_comparisions/methods/_statsmodels.py 100.00% <100.00%> (ø)
src/multi_condition_comparisions/pl/__init__.py 100.00% <100.00%> (ø)
src/multi_condition_comparisions/tl/__init__.py 100.00% <100.00%> (ø)
src/multi_condition_comparisions/tl/de.py 100.00% <100.00%> (+11.29%) ⬆️
.../multi_condition_comparisions/methods/_pydeseq2.py 96.42% <96.42%> (ø)
src/multi_condition_comparisions/methods/_edger.py 85.93% <85.93%> (ø)
src/multi_condition_comparisions/methods/_base.py 84.21% <84.21%> (ø)
... and 1 more

@ilan-gold
Copy link
Collaborator

@Zethson Are you saying you want to keep run_de and just use that instead of compare_groups as the "public facing API?" I would lean more towards the class-based implementation as the main public-facing API - after all, that's what scanpy is moving towards as well.

@Zethson
Copy link
Member

Zethson commented Mar 11, 2024

This is also what I was trying to say above @ilan-gold . I also considered only offering the classes as the single API option which would make our life a bit easier and it might be a good idea to only have one way to do things. I'm like 50/50 but if you lean more towards that, I'd totally support it

@ilan-gold
Copy link
Collaborator

ilan-gold commented Mar 11, 2024

I removed the wrapper and added a "correctness" test where we draw from two very different negative binomial distributions and then test them for different groups. This seems to be failing for reasons I don't understand...maybe a bug but the fact that only PyDESeq2 seems to be failing indicates this might not be the case...

I will try to keep digging a bit more

@ilan-gold
Copy link
Collaborator

Oh well, apparently there are other issues

Comment on lines 65 to 76
obs_df = obs_df.sort_values(paired_by)
for group_to_compare in groups_to_compare:
comparison_idx = np.where(obs_df[column] == group_to_compare)[0]
if baseline is None:
baseline_idx = np.where(obs_df[column] != group_to_compare)[0]
else:
baseline_idx = np.where(obs_df[column] == baseline)[0]
res_dfs.append(
model._compare_single_group(baseline_idx, comparison_idx).assign(
comparison=f"{group_to_compare}_vs_{baseline if baseline is not None else 'rest'}"
)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilan-gold, I'm not entirly sure this works for the paired test. The indices generated here are valid for the sorted df, but the _compare_single_group function only has the unsorted AnnData -- so I don't think the indices are referring to the correct observations. Or am I missing something?

if fit_kwargs is None:
fit_kwargs = {}
if paired_by is not None:
warnings.warn("Cannot use `paired_by` with linear tests. Ignoring paramere", UserWarning, stacklevel=2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can, just use f"~{column} + {paired_by}" as the formula.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow this? Wouldn't this be testing groups of 2 against each other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A paired t-test is a special case of a linear model with groups of two and f"~{column} + {paired_by}".

@ilan-gold
Copy link
Collaborator

ilan-gold commented Mar 11, 2024

@Zethson @grst Outstanding issues:

  1. Paired EdgeR correctness testing is not working. Here's the commit with the fixture: bc6d8ae. What's failing is a false positive (i.e., low p-value) result of a gene whose expression across the two groups, A and B, comes from identical mixtures of two negative binomial distributions. These are then paired in order down the dataframe.
  2. Both pydeseq2 methods fail correctness testing, and the resulting dataframe looks completely wrong (reversed i.e., gene that should have lower p-value has higher). I have no idea why this is happening. I have looked at this a bunch and the fact that the unpaired test is not even passing (unlike all other methods) makes me think something else is going on here.

Maybe a paired coding session to walk through it tomorrow morning @Zethson?

@grst
Copy link
Contributor Author

grst commented Mar 12, 2024

maybe even just merge this and follow up on the fixes in separate PRs? Easier to review smaller units of change and this PR was mostly about the restructuring. If PyDESeq2 is broken, it's probably not the fault of this PR.

@ilan-gold
Copy link
Collaborator

@grst 100% agreed. I do think that whatever is broken here is probably not a bug from us (although I have been known to be wrong in the past!). I've already reported to the PyDESeq2 people

@grst grst merged commit 6e06241 into main Mar 13, 2024
2 of 5 checks passed
@grst grst deleted the grst/refactor branch March 13, 2024 06:42
@grst grst mentioned this pull request Mar 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants