Add required variables to the `Formula` class #179

timpiperseek · 2024-03-07T04:22:44Z

I would like to be able to do something like the following. Appologies I am struggling to articulate what I want but effectively I want the following.

Say I have the following formula.
apps ~ prior_apps + I(prior_apps^2) + factor + I(prior_apps:factor)

I am wondering if it is possible to get extract out the rhs terms from the formula. By terms I mean ['prior_apps','factor']

I have tried doing the following.

formula_parser = formulaic.parser.DefaultFormulaParser()
tokens = formula_parser.get_tokens(formula_str)
tokens = [t for t in tokens]

but that gets me the individual parts of the string and not the terms.

I feel like it should be possible?

The text was updated successfully, but these errors were encountered:

matthewwardrop · 2024-03-08T04:45:36Z

Hi @timpiperseek ,

Does something like the following work?

from formulaic import Formula
f = Formula("apps ~ prior_apps + I(prior_apps**2) + factor + prior_apps:factor")
set(
    factor
    for term in f.rhs
    for factor in term.factors
)
# This would output all the factors: {1, I(prior_apps ** 2), factor, prior_apps}

(Note that interaction terms should not be enclosed in "I(...)", since that is a Python function call).

If you need to, you could parse the AST represented by the non-lookup factors (e.g. I(prior_apps ** 2)) to extract the variables used; prior_apps here.

If you are actually just looking for the terms, you can do: list(f.rhs) == [1, prior_apps, I(prior_apps ** 2), factor, prior_apps:factor].

Does that help?

timpiperseek · 2024-03-08T07:04:10Z

yeah that is really close to what I am after.

what do you mean by

If you need to, you could parse the AST represented by the non-lookup factors (e.g. I(prior_apps ** 2)) to extract the variables used; prior_apps here.

because ideally it would also identify that prior_apps**2 is the same underlying metric as prior_apps.

matthewwardrop · 2024-03-09T00:45:57Z

Ah... Using some internal utility functions you can do:

from formulaic import Formula
from formulaic.utils.variables import get_expression_variables
f = Formula("apps ~ prior_apps + I(prior_apps**2) + factor + prior_apps:factor")
set(
    variable
    for term in f.rhs
    for factor in term.factors
    for variable in get_expression_variables(factor.expr, {})
    if "value" in variable.roles
)
# Outputs: {'factor', 'prior_apps'}

Note that get_expression_variables parses the AST associated with the python expression, which is used internally to keep track of which variables have been used when generating the model matrix.

timpiperseek · 2024-03-10T02:25:28Z

Oh that is absolutely awesome, thank you.

matthewwardrop · 2024-03-11T16:07:41Z

I'll consider adding this directly to the formula class as something like .required_variables.

mayer79 · 2024-06-23T09:32:17Z

This would indeed be very handy, thx.

matthewwardrop closed this as completed Mar 11, 2024

matthewwardrop self-assigned this Mar 11, 2024

matthewwardrop added the enhancement New feature or request label Mar 11, 2024

matthewwardrop changed the title ~~is it possible to extract terms without providing a dataframe~~ Add required variables to the Formula class Mar 11, 2024

matthewwardrop reopened this Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add required variables to the `Formula` class #179

Add required variables to the `Formula` class #179

timpiperseek commented Mar 7, 2024

matthewwardrop commented Mar 8, 2024 •

edited

Loading

timpiperseek commented Mar 8, 2024

matthewwardrop commented Mar 9, 2024

timpiperseek commented Mar 10, 2024

matthewwardrop commented Mar 11, 2024

mayer79 commented Jun 23, 2024

Add required variables to the Formula class #179

Add required variables to the Formula class #179

Comments

timpiperseek commented Mar 7, 2024

matthewwardrop commented Mar 8, 2024 • edited Loading

timpiperseek commented Mar 8, 2024

matthewwardrop commented Mar 9, 2024

timpiperseek commented Mar 10, 2024

matthewwardrop commented Mar 11, 2024

mayer79 commented Jun 23, 2024

Add required variables to the `Formula` class #179

Add required variables to the `Formula` class #179

matthewwardrop commented Mar 8, 2024 •

edited

Loading