Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add small utility to profile any function #622

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,9 @@ venv.bak/
# mypy
.mypy_cache/

# profiling
/profiles

# Miscelaneous
.idea
.vscode
Expand Down
70 changes: 70 additions & 0 deletions docs/contribute/contribute_code.rst
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,76 @@ you want to know why we prefer tox, this
will tell you everything ;)


Code Profiling
--------------

If you want to profile your code, you can use the **profiling** module in root directory. There you will find two files,
`profiling.py` and `profiling.sh`. Both file does the same thing but in different ways. The profiling.py file is a python script
containing a function that must be used as a decorator for the class/method we want to profile.
The profiling.sh file is a bash/zsh script that you can run from the command line to profile whole .py script file.
Let us see how to use them. First, start with profiling.py file.

I doubt that `DropDuplicateFeatures` class should take more time than other classes as it iterates over the columns and
checks if they are duplicated or not. So, I will profile the `DropDuplicateFeatures` class.

First, I will find where this class resides and on top of the imports I will add the following line::

from profiling.profiling import profile_function

Now, I will decorate the `DropDuplicateFeatures.fit` method with the `profile_function` function::

@profile_function(output_file="profile.html")
def fit(self, X: pd.DataFrame, y: pd.Series = None):
...

The next step is to create a temporary .py file that will contain the code that we want to profile.

For example, I will create a file named `temp.py` and copy the following code::

import pandas as pd
import numpy as np

from feature_engine.selection import DropDuplicateFeatures


if __name__ == "__main__":
rows = 10000
cols = 60000
col_names = [f"col_{i}" for i in range(cols)]
df = pd.DataFrame(np.random.randint(0, 100, size=(rows, cols)), columns=col_names)

transformer = DropDuplicateFeatures()
transformer.fit(df)

train_t = transformer.transform(df)


Now, I will run the `temp.py` file from the command line::

$ python temp.py

This will create a file named `profile.html` in the root directory of the project. This file contains the profiling
results. You can open it with your favorite browser and inspect the results.

If you don't like adding additional imports and decorator, then you can use the `profiling.sh` file. This file is a bash/zsh
script that you can run from the command line. Let us see how to use it.

Again, I will profile the `DropDuplicateFeatures` class. I need to create a temporary .py file and put the same code as above.
After that, open the terminal in root directory and run the following command::

$ ./profiling/profiling.sh temp.py


This will create a directory, named `profiles`, in the root directory of the project. This directory contains tw files:
the first is .html file and you can open it with any browser, the second file is .json file and you can use
`speedscope <https://www.speedscope.app/>`_ to visualize results.


.. note::
To profile the memory usage, you can use the `memray` package. You can find more information about it
`here <https://bloomberg.github.io/memray/index.html>`_.


Review Process
--------------

Expand Down
29 changes: 29 additions & 0 deletions profiling/profiling.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import functools

from pyinstrument.profiler import Profiler


def profile_function(output_file="profile.html"):
"""
Profiles a function execution time.

Parameters
----------
output_file: file to write profile output. Defaults to "profile.html".
"""

def decorator(function):
@functools.wraps(function)
def wrapper(*args, **kwargs):
profiler = Profiler()
profiler.start()
result = function(*args, **kwargs)
profiler.stop()
output = profiler.output_html()
with open(output_file, "w") as f:
f.write(output)
return result

return wrapper

return decorator
7 changes: 7 additions & 0 deletions profiling/profiling.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
mkdir -p profiles/code_profiles

file="$@"

pyinstrument -r html -o profiles/code_profiles/performance_profile_$(date "+%Y.%m.%d-%H:%M").html $file

pyinstrument -r speedscope -o profiles/code_profiles/speedscope_$(date "+%Y.%m.%d-%H:%M").json $file
1 change: 1 addition & 0 deletions test_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ coverage>=6.4.4
flake8>=3.9.2
isort>=5.8.0
mypy>=0.740
pyinstrument>=4.4.0