Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply WindowStatisticsTransform to regressors #474

Merged
merged 5 commits into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed
-
-
-
- Update docstings of `WindowStatisticsTransform`'s parents ([#469](https://github.com/etna-team/etna/pull/474))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should give some more clarification: Add docstring warning about handling non-regressors (including target) to children of WindowStatisticsTransform

-
-
-
Expand Down
80 changes: 73 additions & 7 deletions etna/transforms/math/statistics.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import warnings
from abc import ABC
from abc import abstractmethod
from typing import Dict
Expand Down Expand Up @@ -59,6 +60,10 @@ def __init__(
def fit(self, ts: TSDataset) -> "WindowStatisticsTransform":
"""Fit the transform."""
self.in_column_regressor = self.in_column in ts.regressors
if not self.in_column_regressor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should give this warning.
A fairly valid approach is to apply MeanTransform to target and then apply LagTransform to the result. It will be strange to force someone to change the order if everything is fine.

However, with new warning we could draw users' attention to the issue. Failed tests are also bothering me, some of them could be problematic and probably should be fixed.

  • Keep this warning for a while (we will remove it a little bit later, see second step).
  • Fix only the tests where we create a pipeline without lag applied to window transform.
  • Ask for re-review (after that we probably remove this warning).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we could remove this warning and make other suggested changes.

warnings.warn(
f"{self.in_column} column is not a regressor. The output column will not also be a regressor."
)
super().fit(ts)
return self

Expand Down Expand Up @@ -136,6 +141,12 @@ class MeanTransform(WindowStatisticsTransform):

.. math::
MeanTransform(x_t) = \\sum_{i=1}^{window}{x_{t - i}\\cdot\\alpha^{i - 1}}

This transform, applied to non-regressor column, generates non-regressor column.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wrap this in a warning block like this:

Warning
-------

For reference, look at ChangePointsTrendTransform.

Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
Expand Down Expand Up @@ -235,6 +246,12 @@ def params_to_tune(self) -> Dict[str, BaseDistribution]:
class StdTransform(WindowStatisticsTransform):
"""StdTransform computes std value for given window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.

Notes
-----
Note that ``pd.Series([1]).std()`` is ``np.nan``.
Expand Down Expand Up @@ -293,7 +310,14 @@ def _aggregate(self, series: np.ndarray) -> np.ndarray:


class QuantileTransform(WindowStatisticsTransform):
"""QuantileTransform computes quantile value for given window."""
"""QuantileTransform computes quantile value for given window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
self,
Expand Down Expand Up @@ -367,7 +391,14 @@ def params_to_tune(self) -> Dict[str, BaseDistribution]:


class MinTransform(WindowStatisticsTransform):
"""MinTransform computes min value for given window."""
"""MinTransform computes min value for given window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
self,
Expand Down Expand Up @@ -418,7 +449,14 @@ def _aggregate(self, series: np.ndarray) -> np.ndarray:


class MaxTransform(WindowStatisticsTransform):
"""MaxTransform computes max value for given window."""
"""MaxTransform computes max value for given window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
self,
Expand Down Expand Up @@ -469,7 +507,14 @@ def _aggregate(self, series: np.ndarray) -> np.ndarray:


class MedianTransform(WindowStatisticsTransform):
"""MedianTransform computes median value for given window."""
"""MedianTransform computes median value for given window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
self,
Expand Down Expand Up @@ -520,7 +565,14 @@ def _aggregate(self, series: np.ndarray) -> np.ndarray:


class MADTransform(WindowStatisticsTransform):
"""MADTransform computes Mean Absolute Deviation over the window."""
"""MADTransform computes Mean Absolute Deviation over the window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
self,
Expand Down Expand Up @@ -577,7 +629,14 @@ def _aggregate(self, series: np.ndarray) -> np.ndarray:


class MinMaxDifferenceTransform(WindowStatisticsTransform):
"""MinMaxDifferenceTransform computes difference between max and min values for given window."""
"""MinMaxDifferenceTransform computes difference between max and min values for given window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
self,
Expand Down Expand Up @@ -630,7 +689,14 @@ def _aggregate(self, series: np.ndarray) -> np.ndarray:


class SumTransform(WindowStatisticsTransform):
"""SumTransform computes sum of values over given window."""
"""SumTransform computes sum of values over given window.

This transform, applied to non-regressor column, generates non-regressor column.
Apply it to regressor columns to get regressor columns too.
In the majority of cases you need to generate regressor to use them in the future.

For example, apply this transform to target lags, not to target directly.
"""

def __init__(
self,
Expand Down
Loading