Solve model.cond with custom materializer #36

grst · 2024-02-26T08:19:44Z

Trying to solve #15 using a custom formulaic materializer.

codecov-commenter · 2024-04-02T18:36:13Z

Codecov Report

Attention: Patch coverage is 94.00000% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 70.66%. Comparing base (7610d50) to head (9d8add0).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #36      +/-   ##
==========================================
+ Coverage   67.67%   70.66%   +2.98%     
==========================================
  Files          12       14       +2     
  Lines         495      559      +64     
==========================================
+ Hits          335      395      +60     
- Misses        160      164       +4

Files	Coverage Δ
src/multi_condition_comparisions/_util/__init__.py	`100.00% <100.00%> (ø)`
src/multi_condition_comparisions/_util/checks.py	`100.00% <ø> (ø)`
...lti_condition_comparisions/methods/_statsmodels.py	`100.00% <ø> (ø)`
...rc/multi_condition_comparisions/_util/formulaic.py	`98.38% <98.38%> (ø)`
src/multi_condition_comparisions/methods/_edger.py	`87.09% <75.00%> (-0.79%)`	⬇️
.../multi_condition_comparisions/methods/_pydeseq2.py	`93.33% <80.00%> (-1.79%)`	⬇️
src/multi_condition_comparisions/methods/_base.py	`86.00% <88.88%> (-2.00%)`	⬇️

review-notebook-app · 2024-04-02T19:58:27Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

grst · 2024-04-12T10:38:25Z

Finally made it! As usual with these things it was more complicated than I anticipated, but it seems to work quite robustly now.

High-level description:

Formulaic uses a "Materializer" class that converts the model specification into a design matrix
It is possible and supported to make a custom materializer that extends the base class provided by formulaic (intended to generate something different than a pandas data frame)
I use a custom materializer to hook into the materializer functions to extract factor metadata that we later need for bulding contrast vectors with model.cond.
model.cond uses this information to check categories and to fill in default values. There might be situations in which the default value can't be inferred, in that case we raise an error.

@const-ae, would be great if you could have a look, too. In particular, if the test cases are correct and if there's any additional (more complex?) model you think that should be tested.

Zethson

Thank you so much @grst! This is a loooooot of work (although I have the suspicion that you enjoyed it :) )

I went through the PR but quickly noticed that it's a bit over my head at the moment. I'd need more headspace to give this a review that would deserve the name "review". I might give this another pass later, but I'm sure that the other reviews will have more useful comments.

src/multi_condition_comparisions/_util/formulaic.py

src/multi_condition_comparisions/methods/_base.py

src/multi_condition_comparisions/_util/formulaic.py

ilan-gold

I'm having trouble following exactly what is going on here or what is solved...I will need to look back. I've never done much with formulaic so still a little bit dazzled....

src/multi_condition_comparisions/_util/__init__.py

ilan-gold · 2024-04-16T09:30:33Z

src/multi_condition_comparisions/_util/formulaic.py

+    factor_storage: dict[str, list[FactorMetadata]] = defaultdict(list)
+    variable_to_factors: dict[str, set[str]] = defaultdict(set)
+
+    class CustomPandasMaterializer(PandasMaterializer):


Why do we need the class declaration inside of another class? This makes reasoning about what goes on a bit challenging

Because each class needs to be tied to one specific factor_storage object. The class gets instantiated by formulaic, therefore we can only pass a class, rather than an instance.

Each class is used only for one formulaic formula and stores the formula-specific metadata in the factor_storage object that is tied to it.

src/multi_condition_comparisions/methods/_base.py

grst · 2024-04-16T09:57:16Z

I can understand that it all looks a bit daunting. I suggest to start reading at def cond(), this is the actual user-facing function to generate contrast vectors. All the other fuzz is about finding the default categories of each categorical variable specified in the model.

Co-authored-by: Ilan Gold <[email protected]>

grst added 7 commits February 26, 2024 09:18

Stub custom materializer

9860b69

Stub materializer factory

91f8b1c

Setup basic factor metadata registry

937e303

Record all required attributes

36f07db

Implement variable2term

56e0b05

Reimplement model.cond using factor_metadata_storage

fdcded0

Merge remote-tracking branch 'origin/main' into custom-materializer

15d882c

grst mentioned this pull request Mar 22, 2024

Add meta.yaml for pyLemur scverse/ecosystem-packages#156

Merged

11 tasks

grst mentioned this pull request Apr 1, 2024

T-test / Improve simple tests #38

Merged

5 tasks

grst and others added 6 commits April 2, 2024 14:05

Merge remote-tracking branch 'origin/main' into custom-materializer

52956ec

Cleanup after merge

8ddd5bb

stub test cases

ee79d70

WIP: deal with custom encoder classes

402cff4

WIP stub testcase

438a469

Add testcases for custom materializer

f6be15d

grst added 3 commits April 2, 2024 21:07

Update docstring for linear model base

784266b

WIP reimplement model.cond

3a1ef9a

Stub testcase for model.cond

90e8247

pre-commit-ci bot and others added 10 commits April 2, 2024 19:58

[pre-commit.ci] auto fixes from pre-commit.com hooks

068c7cc

for more information, see https://pre-commit.ci

Fix that contrasts couldn't be build from model spec

46f1420

Fix edgeR type hints

c7d3290

Fix pydeseq2 function signatures

36ed509

Fix edgeR tests

9d8add0

Fix model.cond term iteration

28faf64

Get rid of class variable stuff

822c623

Account for multiple factors being generated

ea027a2

only fail cond if variable is ambiguous

cab9750

Update comments

4039e23

grst added 9 commits April 9, 2024 17:30

Add test for resolve ambiguous

ef99eb4

Add more test cases and fix others

e59e760

Use mapping variable -> factor instead of variable -> term

43f77e1

Fix remaining model.cond testcases

54a0d0b

Fix formulaic testcase

86f114a

Reset example notebook

2c8df26

Restet conftest

99992c1

Refactor

6104af0

Add formulaic glossary

e180e7a

grst marked this pull request as ready for review April 12, 2024 10:33

grst requested review from Zethson, const-ae and ilan-gold April 12, 2024 10:33

Zethson reviewed Apr 12, 2024

View reviewed changes

src/multi_condition_comparisions/_util/formulaic.py Show resolved Hide resolved

src/multi_condition_comparisions/methods/_base.py Outdated Show resolved Hide resolved

src/multi_condition_comparisions/_util/formulaic.py Outdated Show resolved Hide resolved

Zethson added 2 commits April 12, 2024 16:06

Fix typo

c2d7687

Fix typo

66f9919

grst mentioned this pull request Apr 15, 2024

Ready to be used? #42

Closed

ilan-gold reviewed Apr 16, 2024

View reviewed changes

Update src/multi_condition_comparisions/methods/_base.py

4cc220d

Co-authored-by: Ilan Gold <[email protected]>

Zethson merged commit 8c4ac7a into main May 18, 2024
5 checks passed

grst mentioned this pull request May 27, 2024

Improve how model.cond works #15

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve model.cond with custom materializer #36

Solve model.cond with custom materializer #36

grst commented Feb 26, 2024 •

edited

Loading

codecov-commenter commented Apr 2, 2024 •

edited

Loading

review-notebook-app bot commented Apr 2, 2024

grst commented Apr 12, 2024

Zethson left a comment •

edited

Loading

ilan-gold left a comment

ilan-gold Apr 16, 2024

grst Apr 16, 2024

grst commented Apr 16, 2024

Solve model.cond with custom materializer #36

Solve model.cond with custom materializer #36

Conversation

grst commented Feb 26, 2024 • edited Loading

codecov-commenter commented Apr 2, 2024 • edited Loading

Codecov Report

review-notebook-app bot commented Apr 2, 2024

grst commented Apr 12, 2024

Zethson left a comment • edited Loading

Choose a reason for hiding this comment

ilan-gold left a comment

Choose a reason for hiding this comment

ilan-gold Apr 16, 2024

Choose a reason for hiding this comment

grst Apr 16, 2024

Choose a reason for hiding this comment

grst commented Apr 16, 2024

grst commented Feb 26, 2024 •

edited

Loading

codecov-commenter commented Apr 2, 2024 •

edited

Loading

Zethson left a comment •

edited

Loading