Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add small utility to profile any function #622

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Okroshiashvili
Copy link
Contributor

Addresses #592 issue

@Okroshiashvili
Copy link
Contributor Author

@solegalli Whenever you'll find time please have a look

@solegalli
Copy link
Collaborator

Hey @Okroshiashvili

This looks good, thank you!

We need a few more details:

  • Could we add scripts to sample every class? Maybe a script per module (imputation, encoding, etc), within profiling, with the function decorating the class so that we can run them locally and see the profile?
  • Details in the contribute documentation with code examples of how we can use the prifiling module to check the function
  • Details in the contribute documentation telling users how they should use this function when developing a new class, and how they can visualize the results?

In short, if you go here: https://feature-engine.trainindata.com/en/latest/contribute/contribute_code.html

You'll see that we tell users how to run and check the tests: pytest, or code style: mypy feature-engine or flake8 feature-engine tests, and all that functionality is already available for developers.

I've just noticed that we need info to explain how to run coverage :/

And we need to add info to explain how to run the profiler, but the user should be able to hit something like profile feature-engine and then the profiling for all classes, or alternatively, the desired classes, shows up.

@Okroshiashvili
Copy link
Contributor Author

Hi @solegalli

Thanks for your input. I have crazy week 😕 I hope I will return to this issue by the end of the week.

Btw, I'm testing new library for memory profiling as well. There are some difficulties and will update you as soon as I figure out things there.

@solegalli
Copy link
Collaborator

Thank you so much!

@Okroshiashvili
Copy link
Contributor Author

@solegalli

Here is a Python library for memory profiling. Please, have a look and if it looks good then I will work to have it for feature engine

@solegalli
Copy link
Collaborator

Hi @Okroshiashvili

Thanks for the files. And sorry for the delayed response.

I don't understand how I am meant to use these files though.

We have the profiling function in profiling.py

Now, say I want to profile the MeanMedianImputer. How do I do that?

And how should I use profiling.sh locally? and what would be the outcome of executing that file?

It would help if you could add information in the contribute guide (https://github.com/feature-engine/feature_engine/blob/main/docs/contribute/contribute_code.rst) so that a user can actually use this functionality.

@Okroshiashvili
Copy link
Contributor Author

@solegalli

Sorry for that confusion. Let me add appropriate documentation about profiling (time as well as memory) and then decide what to do

@Okroshiashvili
Copy link
Contributor Author

Hi @solegalli

Sorry for the latency 😔

I've updated docs. Please, check and if all good then we can go and merge.

@solegalli
Copy link
Collaborator

Hey @Okroshiashvili

This looks really good! Thank you for adding such clear details in the documentation. That's great.

I am going on hols from tomorrow till Monday 15th. As soon as I am back, I'd like to play around with the code a bit, and then I merge.

Sorry for the delay on my side now ;)

Have a good weekend!

@solegalli
Copy link
Collaborator

Hey @Okroshiashvili

Thank you so much for the contribution and the patience with me coming back to this.

I just tested the code you implemented.

I made a PR: Okroshiashvili#1 with minor rewording on the contribute file.

This is my experience from running the code as in the contribute:

When doing it via the python script, I do get the profile.html.

The content of the profile.html file is very limited. I don't understand how that is helpful. Maybe this is me being completely ignorant about profiling, but I just get speeds? It doesn't really tell me what could be optimized, which is what I was hoping for. I don't understand how I can use that information to improve my code.

When running the file via bash, I still get the profile.html, and then a folder with another folder inside, and in it 2 files. But the app you mention in the documentation didn't manage to open any of those files. Not sure what went wrong.

Maybe I need to do some reading about profiling. Maybe I am expecting more from it that it can give. My idea for this functionality was to to have a function that provides some information about the performance of the code, and ideally how it can be improved.

One user made a PR sometime ago regarding copy() from pandas, and they said that they did profiling, and they discovered that the transformers from feature engine were slow because that command, copy() which comes from pandas is putting a lot of data in memory. This is useful actionable insight. Speeds with nothing else, I am not sure how it helps.

I am happy to admit that my knowledge on the topic is very little though. I might be totally wrong with my expectation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants