-
Notifications
You must be signed in to change notification settings - Fork 2
Improve how model.cond works #15
Comments
Hi Gregor, I was looking into getting the default values again and was wondering if the following works or if there was a reason why we dismissed it? import formulaic
import pandas as pd
import numpy as np
data = pd.DataFrame({"celltype": ['A', 'A', 'B', 'C', 'C', 'C'],
"condition": ['a', 'b', 'a', 'b', 'a', 'b'],
"cont": [1, 2, 3, 4, 5, 6]})
formulaic.model_matrix('~ condition', data).model_spec.encoder_state['condition'][1]['categories']
data["condition"] = pd.Categorical(data["condition"], categories = ["b", "a"], dtype = "category")
formulaic.model_matrix('~ condition', data).model_spec.encoder_state['condition'][1]['categories'] |
The reason was that if you use the syntax described here, it breaks: >>> formulaic.model_matrix('~ C(condition, contr.treatment(base="b"))', data).model_spec.encoder_state['C(condition, contr.treatment(base="b"))'][1]['categories']
['a', 'b'] Of course, also something we could decide not to support initially, but it would be nice to have a robust way to achieve this. |
Ah yes. But I think it is fine to nonetheless use 'a' as a reference level. I think the reference level should be based on whatever the data contains and not which additional modifications the formula applies. I think a helpful analogy is to consider a continuous covariate and |
Also adding here that I think this function should print out the actual contrast vector used (maybe omit zeros). contrasts.T.loc[lambda x: x[0] != 0] could work nicely. |
Closed by scverse/pertpy#36 |
Description of feature
The text was updated successfully, but these errors were encountered: