Group by calculation inside the pipe #897

MislavSag · 2023-02-08T21:42:55Z

Hi,

Recently, I am trying to build mlr3 pipeline (graph) for predicting financial outcomes (financial time series).

In preprocessing step, I often need to apply some function on group by basis. More concretely, I need to apply some function by month.

I have already opened an issue with an example: winsorization by groups: mlr-org/mlr3pipelines#583
In that example, I want to winsorize the data for every month (or every quarter). I doesn't have much sense to winsorize the data across time dimension. So I need month column (or quarter column). But month column is not a feature. It is not a target. I can set a role of that feature to group in the beginning, but how should I used it than. I can get the group column if I use .train_task in Preprocesing pipe, but I actually need .train_dt method.

The problem is more general because instead of winsorization, I could use scaling by group or any other function.

I kindly ask for your recommendation, what is the best way to implement above Pipe?

The solution I thought about:

Set month (or more generally date) column to group. Than, if group is set, apply function (say scaling) on group by basis.
Use month (or date) column as feature but exclude this column in other preprocessing operation (for example we don't want to scale dates).
Set row ids to date and use that for grouping.

EDIT:

Maybe I can put questions more generally. What approach do you recommend if we want to use some columns in preprocessing, but we don't want to use them as fetures or give them other colun roles?

I am aware of mlr3temporal package which had inherited Task class and created the new, TaskForecast class. Maybe I should use this task in my case? And what if I had id and date columns, should I create my own task (TaskPanel for example) by inheriting Task?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group by calculation inside the pipe #897

Group by calculation inside the pipe #897

MislavSag commented Feb 8, 2023 •

edited

Loading

Group by calculation inside the pipe #897

Group by calculation inside the pipe #897

Comments

MislavSag commented Feb 8, 2023 • edited Loading

MislavSag commented Feb 8, 2023 •

edited

Loading