Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[date features]: dayofweek_cat - day of week as a one hot encoding feature #315

Open
nelsoncardenas opened this issue Feb 20, 2024 · 4 comments

Comments

@nelsoncardenas
Copy link

nelsoncardenas commented Feb 20, 2024

Description

In my search through the documentation for the dayofweek parameter usage in date_feature inputs, I noticed that dayofweek is treated as an ordinal feature. However, for models such as linear regression, representing this as a one-hot encoding feature could be more efective.

Here are some suggestions I've considered:

  • Introducing dayofweek_cat as a recognized string for creating this feature.
  • Compiling a list of all acceptable strings for the date_feature parameter in the documentation or a link for the Pandas documentation with that list.
  • Including an example in the Custom Date Features section on how to effectively implement this. The current documentation mentions the function is_monday. This might lead a user to wonder whether it's possible to create a function that returns a DataFrame with multiple columns (for instance, is_monday, is_tuesday, ..., is_sunday). However, I validated this is not possible :(

Use case

A user wants to:

  • Use one hot encoding easily.
  • Read the list of available date attributes.
  • Implement custom functions that return more than one new column.
  • Understand the limits of the Custom Date Features.

My test

import pandas as pd
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series


def dayofweek_cat(dates):
    num_to_text = {
        0: "monday",
        1: "tuesday",
        2: "wednesday",
        3: "thursday",
        4: "friday",
        5: "saturday",
        6: "sunday",
    }
    df_dayofweek_cat = pd.get_dummies(dates.dayofweek).astype("uint8")
    df_dayofweek_cat.columns = [f"is_{num_to_text[col]}" for col in df_cats.columns]
    return df_dayofweek_cat


series = generate_daily_series(1, min_length=6, max_length=6)
print(f"output dayofweek_cat function: {dayofweek_cat(series['ds'].dt).columns}")

fcst = MLForecast([], freq="D", date_features=["dayofweek", "dayofyear", dayofweek_cat])
fcst.preprocess(series)
@jmoralez
Copy link
Member

Hey @nelsoncardenas, thanks for using mlforecast and for the detailed report. I think the easiest way to achieve this is with a scikit-learn pipeline. Here's an example:

import pandas as pd
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder

series = generate_daily_series(1, min_length=7, max_length=7)
model = make_pipeline(
    ColumnTransformer(
        [('encoder', OneHotEncoder(drop='first'), ['dayofweek'])],
        remainder='passthrough'
    ),
    LinearRegression()
)
fcst = MLForecast(models={'lr': model}, freq="D", date_features=["dayofweek"])
fcst.fit(series)
print(fcst.models_['lr'].named_steps['linearregression'].n_features_in_)  # 6

The available attributes are:

  • pandas: the ones listed under the "Attributes" section here.
  • polars: most of the ones here.

If you have time and would like to do it we'd appreciate a PR that explicitly lists the supported ones.

@nelsoncardenas
Copy link
Author

Thank you @jmoralez I'd like to help with that PR.

What would be the suggested steps?

@jmoralez
Copy link
Member

jmoralez commented Feb 20, 2024

I think you could add two lists (one for pandas and one for polars) in the nbs/core.ipynb notebook. We have this file with some contributing guidelines, but the first step should be to fork this repository and work on your fork instead (I'll fix that soon). Let me know if you have any questions.

@nelsoncardenas
Copy link
Author

@jmoralez Thank you. During the week I will dedicate some free time to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants