Improve how model.cond works #15

grst · 2023-12-01T09:30:08Z

Description of feature

The implementation for finding the baseline level for each variable is currently hacky. Need to reach out to the formulatic devs for input.
The function should print out the actual contrast used for diagnostics

const-ae · 2023-12-04T09:37:52Z

Hi Gregor,

I was looking into getting the default values again and was wondering if the following works or if there was a reason why we dismissed it?

import formulaic
import pandas as pd
import numpy as np

data = pd.DataFrame({"celltype": ['A', 'A', 'B', 'C', 'C', 'C'],
        "condition": ['a', 'b', 'a', 'b', 'a', 'b'],
        "cont": [1, 2, 3, 4, 5, 6]})
formulaic.model_matrix('~ condition', data).model_spec.encoder_state['condition'][1]['categories']
data["condition"] = pd.Categorical(data["condition"], categories = ["b", "a"], dtype = "category")
formulaic.model_matrix('~ condition', data).model_spec.encoder_state['condition'][1]['categories']

grst · 2023-12-04T20:28:39Z

The reason was that if you use the syntax described here, it breaks:

>>> formulaic.model_matrix('~ C(condition, contr.treatment(base="b"))', data).model_spec.encoder_state['C(condition, contr.treatment(base="b"))'][1]['categories']
['a', 'b']

Of course, also something we could decide not to support initially, but it would be nice to have a robust way to achieve this.

const-ae · 2023-12-05T09:31:34Z

Ah yes. But I think it is fine to nonetheless use 'a' as a reference level. I think the reference level should be based on whatever the data contains and not which additional modifications the formula applies.

I think a helpful analogy is to consider a continuous covariate and formulaic.model_matrix('~ I(cont + 10)', data). The formula shifts all values up by 10, but that doesn't mean that we would move the continuous intercept to -10. Applying the same logic to the categorical covariates means that I think it's fine to just look at the categories.

grst · 2023-12-06T06:58:34Z

But it does change the design matrix accordingly?

grst · 2023-12-10T19:37:36Z

Also adding here that I think this function should print out the actual contrast vector used (maybe omit zeros).
It's already a dataframe, so something like

contrasts.T.loc[lambda x: x[0] != 0]

could work nicely.

grst · 2024-05-27T07:16:46Z

Closed by scverse/pertpy#36
Follow up in https://github.com/theislab/pertpy/issues/612

grst added the enhancement New feature or request label Dec 1, 2023

grst assigned grst and const-ae Dec 1, 2023

grst mentioned this issue Dec 14, 2023

Is there a way to get the baseline value for categorical variables? matthewwardrop/formulaic#169

Closed

grst mentioned this issue Feb 26, 2024

Solve model.cond with custom materializer #36

Merged

11 tasks

grst mentioned this issue Nov 25, 2024

Add diagnostic output for model.cond scverse/formulaic-contrasts#1

Open

grst closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve how model.cond works #15

Improve how model.cond works #15

grst commented Dec 1, 2023 •

edited

Loading

const-ae commented Dec 4, 2023

grst commented Dec 4, 2023

const-ae commented Dec 5, 2023

grst commented Dec 6, 2023

grst commented Dec 10, 2023

grst commented May 27, 2024

Improve how model.cond works #15

Improve how model.cond works #15

Comments

grst commented Dec 1, 2023 • edited Loading

Description of feature

const-ae commented Dec 4, 2023

grst commented Dec 4, 2023

const-ae commented Dec 5, 2023

grst commented Dec 6, 2023

grst commented Dec 10, 2023

grst commented May 27, 2024

grst commented Dec 1, 2023 •

edited

Loading