Add `MeanEncoderTransform` #413

egoriyaa · 2024-06-25T22:33:26Z

Before submitting (must do checklist)

Did you read the contribution guide?
Did you update the docs? We use Numpy format for all the methods and classes.
Did you write any new necessary tests?
Did you update the CHANGELOG?

Proposed Changes

Closing issues

closes #12

github-actions · 2024-06-25T22:37:53Z

🚀 Deployed on https://deploy-preview-413--etna-docs.netlify.app

etna/transforms/encoders/mean_encoder.py

d-a-bunin · 2024-07-08T14:53:05Z

etna/transforms/encoders/mean_encoder.py

+            global_means = dict(zip(segments, global_means))
+
+            global_means_category = {}
+            for segment in segments:


Can't we, in theory, groupby by both "segment" and in_column to get rid of this cycle over segments?

It remains valid.

tests/test_transforms/test_encoders/test_mean_encoder_transform.py

d-a-bunin · 2024-07-09T08:06:18Z

etna/transforms/encoders/mean_encoder.py

+                    intersected_df.loc[segment_df.index, self.out_column] = feature
+                    if self.handle_missing is MissingMode.global_mean:
+                        nan_index = segment_df[segment_df[self.in_column].isnull()].index
+                        expanding_mean = y.expanding().mean().shift().fillna(0)


It isn't very clear that first values are filled with 0.

etna/transforms/encoders/mean_encoder.py

d-a-bunin · 2024-07-11T10:05:36Z

etna/transforms/encoders/mean_encoder.py

+            global_means = dict(zip(segments, global_means))
+
+            global_means_category = {}
+            for segment in segments:


It remains valid.

etna/transforms/encoders/mean_encoder.py

d-a-bunin · 2024-07-11T10:07:32Z

etna/transforms/encoders/mean_encoder.py

+import numpy as np
+import pandas as pd
+from bottleneck import nanmean
+from pandas import Timestamp


I don't see any good reason for this import.

I don't understand

Why do we need this import? Can't we just use pd.Timestamp?

d-a-bunin · 2024-07-11T10:07:40Z

etna/transforms/encoders/mean_encoder.py

+
+        self._global_means: Optional[Union[float, Dict[str, float]]] = None
+        self._global_means_category: Optional[Union[Dict[str, float], Dict[str, Dict[str, float]]]] = None
+        self._last_timestamp: Optional[Timestamp] = None


It should have type: Union[Timestamp, int, None].

timestamp can be None in TSDataset?

Timestamp can be int.

You could write Optional[Union[Timestamp, int]], but it probably easier to write Union[Timestamp, int, None].

tests/test_transforms/test_encoders/test_mean_encoder_transform.py

etna/transforms/encoders/mean_encoder.py

d-a-bunin · 2024-07-11T10:19:58Z

etna/transforms/encoders/mean_encoder.py

+                categories = pd.unique(df.loc[:, self.idx[:, self.in_column]].values.ravel())
+
+                cumstats = pd.DataFrame(data={"sum": 0, "count": 0, self.in_column: categories})
+                start_index = np.arange(0, len(timestamps) * n_segments, len(timestamps))


What is it for?

It is indexes in flatten df for one timestamp

tests/test_transforms/test_encoders/test_mean_encoder_transform.py

etna/transforms/encoders/mean_encoder.py

codecov · 2024-07-12T12:31:34Z

Codecov Report

Attention: Patch coverage is 97.88732% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.72%. Comparing base (4a8bbb5) to head (86d0cab).
Report is 2 commits behind head on master.

Files	Patch %	Lines
etna/transforms/encoders/mean_encoder.py	97.85%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #413       +/-   ##
===========================================
+ Coverage    9.61%   86.72%   +77.10%     
===========================================
  Files         226      227        +1     
  Lines       15594    15753      +159     
===========================================
+ Hits         1500    13662    +12162     
+ Misses      14094     2091    -12003

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

add MeanEncoderTransform

f9e1228

github-actions bot temporarily deployed to pull request June 25, 2024 22:37 Inactive

d-a-bunin requested changes Jun 27, 2024

View reviewed changes

Egor Baturin added 3 commits June 28, 2024 17:18

fix fit

9d617c2

Merge branch 'master' into issue-12

5117c65

fix code

65961b7

github-actions bot temporarily deployed to pull request July 2, 2024 14:52 Inactive

final

d097138

github-actions bot temporarily deployed to pull request July 8, 2024 13:20 Inactive

egoriyaa commented Jul 8, 2024

View reviewed changes

etna/transforms/encoders/mean_encoder.py Outdated Show resolved Hide resolved

egoriyaa commented Jul 8, 2024

View reviewed changes

etna/transforms/encoders/mean_encoder.py Show resolved Hide resolved

Egor Baturin added 2 commits July 8, 2024 17:04

fix mode name

7da63fb

resolve changelog

da0b4f2

egoriyaa force-pushed the issue-12 branch from f15898f to da0b4f2 Compare July 8, 2024 14:28

resolve changelog

3a0806c

github-actions bot temporarily deployed to pull request July 8, 2024 14:35 Inactive

d-a-bunin requested changes Jul 8, 2024

View reviewed changes

d-a-bunin reviewed Jul 9, 2024

View reviewed changes

Egor Baturin added 2 commits July 11, 2024 12:09

fix all

0c4ae1a

fix conflict

5a1d7ab

egoriyaa force-pushed the issue-12 branch from 02b5dd0 to 5a1d7ab Compare July 11, 2024 09:12

github-actions bot temporarily deployed to pull request July 11, 2024 09:17 Inactive

d-a-bunin requested changes Jul 11, 2024

View reviewed changes

d-a-bunin reviewed Jul 11, 2024

View reviewed changes

etna/transforms/encoders/mean_encoder.py Show resolved Hide resolved

add comments

77f6163

d-a-bunin reviewed Jul 11, 2024

View reviewed changes

etna/transforms/encoders/mean_encoder.py Show resolved Hide resolved

github-actions bot temporarily deployed to pull request July 11, 2024 10:32 Inactive

satisfy mypy

b669ca0

github-actions bot temporarily deployed to pull request July 11, 2024 11:08 Inactive

add tests, fix docs

9a22e43

github-actions bot temporarily deployed to pull request July 11, 2024 22:28 Inactive

egoriyaa requested a review from d-a-bunin July 12, 2024 07:54

d-a-bunin previously approved these changes Jul 12, 2024

View reviewed changes

fix

5d02777

egoriyaa dismissed d-a-bunin’s stale review via 5d02777 July 12, 2024 11:54

github-actions bot temporarily deployed to pull request July 12, 2024 11:58 Inactive

egoriyaa self-assigned this Jul 12, 2024

egoriyaa requested a review from d-a-bunin July 12, 2024 13:01

fix

86d0cab

d-a-bunin approved these changes Jul 12, 2024

View reviewed changes

github-actions bot temporarily deployed to pull request July 12, 2024 14:52 Inactive

egoriyaa merged commit 12f19fb into master Jul 12, 2024
16 checks passed

egoriyaa deleted the issue-12 branch September 9, 2024 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `MeanEncoderTransform` #413

Add `MeanEncoderTransform` #413

egoriyaa commented Jun 25, 2024

github-actions bot commented Jun 25, 2024 •

edited

Loading

d-a-bunin Jul 8, 2024

d-a-bunin Jul 11, 2024

d-a-bunin Jul 9, 2024

egoriyaa Jul 11, 2024

d-a-bunin Jul 11, 2024

d-a-bunin Jul 11, 2024

egoriyaa Jul 11, 2024

d-a-bunin Jul 11, 2024

egoriyaa Jul 12, 2024

d-a-bunin Jul 11, 2024

egoriyaa Jul 11, 2024

d-a-bunin Jul 11, 2024

d-a-bunin Jul 11, 2024

egoriyaa Jul 12, 2024

d-a-bunin Jul 11, 2024

egoriyaa Jul 11, 2024

codecov bot commented Jul 12, 2024 •

edited

Loading

Add MeanEncoderTransform #413

Add MeanEncoderTransform #413

Conversation

egoriyaa commented Jun 25, 2024

Before submitting (must do checklist)

Proposed Changes

Closing issues

github-actions bot commented Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 12, 2024 • edited Loading

Codecov Report

Add `MeanEncoderTransform` #413

Add `MeanEncoderTransform` #413

github-actions bot commented Jun 25, 2024 •

edited

Loading

codecov bot commented Jul 12, 2024 •

edited

Loading