You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, I am trying to build mlr3 pipeline (graph) for predicting financial outcomes (financial time series).
In preprocessing step, I often need to apply some function on group by basis. More concretely, I need to apply some function by month.
I have already opened an issue with an example: winsorization by groups: mlr-org/mlr3pipelines#583
In that example, I want to winsorize the data for every month (or every quarter). I doesn't have much sense to winsorize the data across time dimension. So I need month column (or quarter column). But month column is not a feature. It is not a target. I can set a role of that feature to group in the beginning, but how should I used it than. I can get the group column if I use .train_task in Preprocesing pipe, but I actually need .train_dt method.
The problem is more general because instead of winsorization, I could use scaling by group or any other function.
I kindly ask for your recommendation, what is the best way to implement above Pipe?
The solution I thought about:
Set month (or more generally date) column to group. Than, if group is set, apply function (say scaling) on group by basis.
Use month (or date) column as feature but exclude this column in other preprocessing operation (for example we don't want to scale dates).
Set row ids to date and use that for grouping.
EDIT:
Maybe I can put questions more generally. What approach do you recommend if we want to use some columns in preprocessing, but we don't want to use them as fetures or give them other colun roles?
I am aware of mlr3temporal package which had inherited Task class and created the new, TaskForecast class. Maybe I should use this task in my case? And what if I had id and date columns, should I create my own task (TaskPanel for example) by inheriting Task?
The text was updated successfully, but these errors were encountered:
Hi,
Recently, I am trying to build mlr3 pipeline (graph) for predicting financial outcomes (financial time series).
In preprocessing step, I often need to apply some function on group by basis. More concretely, I need to apply some function by month.
I have already opened an issue with an example: winsorization by groups: mlr-org/mlr3pipelines#583
In that example, I want to winsorize the data for every month (or every quarter). I doesn't have much sense to winsorize the data across time dimension. So I need month column (or quarter column). But month column is not a feature. It is not a target. I can set a role of that feature to group in the beginning, but how should I used it than. I can get the group column if I use .train_task in Preprocesing pipe, but I actually need .train_dt method.
The problem is more general because instead of winsorization, I could use scaling by group or any other function.
I kindly ask for your recommendation, what is the best way to implement above Pipe?
The solution I thought about:
EDIT:
Maybe I can put questions more generally. What approach do you recommend if we want to use some columns in preprocessing, but we don't want to use them as fetures or give them other colun roles?
I am aware of mlr3temporal package which had inherited Task class and created the new,
TaskForecast
class. Maybe I should use this task in my case? And what if I had id and date columns, should I create my own task (TaskPanel
for example) by inheritingTask
?The text was updated successfully, but these errors were encountered: