Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add commutative to Operator #36

Open
tomicapretto opened this issue Sep 3, 2021 · 1 comment
Open

Proposal: Add commutative to Operator #36

tomicapretto opened this issue Sep 3, 2021 · 1 comment
Assignees
Milestone

Comments

@tomicapretto
Copy link

#34 Adds support for random effects via the pipe operator. While going through the example in the PR I noticed that operations that are written as x2|m were then printed as m:x2 because formulaic sorts the factors within a term. Apart from the : that should be a | (this is not a major issue), order here does matter, so printing x2|m is not the same as printing m|x2.

This issue suggests that the Operator class could have a commutative property that indicates whether you can change the order of the operators or not.

But maybe there's a better approach? Let's have this space for discussion.

@matthewwardrop matthewwardrop self-assigned this Oct 3, 2021
@matthewwardrop
Copy link
Owner

Hi @tomicapretto! I thought about this a bit over the weekend, and here is my proposal. Let me know what you think.

Rather than adding a commutative option to Operator objects, I propose to instead allow Term instances to override how resulting columns are named. This would allow you to subclass Term to say RandomEffectTerm. Random effect operators would then generated RandomEffectTerm instances instead of Term instances, but honour the same API. The result might look a bit like this (untested code):

data = pandas.DataFrame({'a': [1,2,3], 'b': ['a','b','c']})

class RandomEffectTerm(Term):
    def __init__(group_factors, group):
        self.group_factors = set(group_factors)
        self.group = group

    @property
    def factors(self):
         return self.group_factors | {self.group}

    def get_column_name(self, features):
         # features would look something like:
         # {'a': 'a', 'b': 'b[T.a]'}
         return ":".join(sorted(v for k, v in features.values if k != self.group.expr)) + "|" + features[self.group.expr]

model_matrix("(a|b)") -> columns of {"a|b[T.a]", "a|b[T.b]", ...}

If this looks good to you, I'll code it up when I get a chance. The last piece would then be to formalise how to surface the different segments of the model matrix (or perhaps how to store multiple of them), in order to get the "fixed effect" and "random effect" parts... but we can deal with that as a separate thread.

@matthewwardrop matthewwardrop added this to the 0.3.x milestone Oct 17, 2021
@matthewwardrop matthewwardrop modified the milestones: 0.3.0, 0.3.x Mar 14, 2022
@matthewwardrop matthewwardrop modified the milestones: 0.3.x, 0.4.x Jun 20, 2022
@matthewwardrop matthewwardrop modified the milestones: 0.4.x, 1.1.0 Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants