Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean Normalisation Scaling #806

Merged

Conversation

VascoSch92
Copy link
Contributor

@VascoSch92 VascoSch92 commented Aug 29, 2024

First version of the MeanNormalizationScaling as discussed in #763

I create a new module scaling as discussed.

Probably, you will also add new scaling in this module.

Copy link

codecov bot commented Aug 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.00%. Comparing base (5dfceb8) to head (b001017).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #806      +/-   ##
==========================================
+ Coverage   97.98%   98.00%   +0.01%     
==========================================
  Files         107      109       +2     
  Lines        4320     4350      +30     
  Branches      857      709     -148     
==========================================
+ Hits         4233     4263      +30     
  Misses         54       54              
  Partials       33       33              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @VascoSch92

This is looking really good. Thank you for the first draft.

I think we could tidy it a bit so that we don't loop neither in fit nor in transform.

Could you take a look?

Thank you!

feature_engine/scaling/mean_normalization.py Outdated Show resolved Hide resolved
feature_engine/scaling/mean_normalization.py Outdated Show resolved Hide resolved
feature_engine/scaling/mean_normalization.py Outdated Show resolved Hide resolved
feature_engine/scaling/mean_normalization.py Outdated Show resolved Hide resolved
tests/test_scaling/test_mean_normalization.py Outdated Show resolved Hide resolved
tests/test_scaling/test_mean_normalization.py Outdated Show resolved Hide resolved
tests/test_scaling/test_mean_normalization.py Outdated Show resolved Hide resolved
@VascoSch92
Copy link
Contributor Author

Hey @solegalli

I addressed your comments.

Please let me know what you think :-)

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @VascoSch92

This is looking really good. The tests are great.

Regarding the logic, I think we can speed this up if we store the range instead of min max,, and if we use dictionaries instead of dataframes. Could you give that a go?

Feel free to start working on the dosctrings and on adding a user guide :)

@VascoSch92
Copy link
Contributor Author

Hey @solegalli

I changed what requested.
Now,

  • params_ is a dictionary Dict[str, pd.Series] with keys 'mean' and 'range'
  • MeanNormalizationScaling -> MeanNormalizationScaler
  • error message for constant columns is a little better :-)

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @VascoSch92 really good work here. Thank you so much!

We need to add a few files to create the docs now. Would you be able to do that as well?

Thanks a lot for the hard work.

@solegalli
Copy link
Collaborator

Last but not least: we need to add the new module on the readme and on the frontpage of the documentation, which lives here: https://github.com/feature-engine/feature_engine/blob/main/docs/index.rst

Thank you!!

@VascoSch92
Copy link
Contributor Author

Hey @solegalli

  • added the documentation for the new scaler
  • divided params_ into mean_ and range_. Now mean_ and range_ are pd.Series. We can also make them np.array if you want. Sklearn uses np.arrays, should we make the same?

docs/index.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @VascoSch92 thanks for the quick turnaround. The api docs look great.

@VascoSch92
Copy link
Contributor Author

Hey @solegalli

I changed to dictionaries instead of pd.Series, and it works :-)

@solegalli
Copy link
Collaborator

Amazing! Thanks a lot!

We just need to add a description / demo to the user guide folder in the docs and we are good to go then :)

@VascoSch92
Copy link
Contributor Author

let me give a look ;-)

@VascoSch92
Copy link
Contributor Author

@solegalli quick question: in general feature engine use scaling transformers from sklearn. Should we also include examples of these transformers?

@VascoSch92
Copy link
Contributor Author

Hey @solegalli

I updated the docs with a demo.

Let me know what do you think. It is just a first version :-)

@VascoSch92
Copy link
Contributor Author

Hey @solegalli :-) did you have time to look at the latest changes?

@solegalli
Copy link
Collaborator

Pending acceptance of suggested changes: VascoSch92#1

…ormalization

rewords the documentation and adds missing links
@solegalli solegalli merged commit ca28618 into feature-engine:main Oct 12, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants