Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review #3

Open
wants to merge 120 commits into
base: review-1-target
Choose a base branch
from
Open

Review #3

wants to merge 120 commits into from

Conversation

afermg
Copy link
Collaborator

@afermg afermg commented Dec 4, 2024

Ignore this Pull Request for now. I will use it to document my code review.

From slack:

  • I think the file in workspace > analysis > attribution.py may require some review as we re doing all the interpretation from there.
  • Let me know if you have questions, I admit I haven't fully document so don't hesitate !It is segmented in several part, with some """ ------ Title --------"""
  • The one Attribution method should be decently okay since I have mainly adapted code from the captum github.
  • You may eventually have a look but it might be tedious as you would eventually have to look through how attribution class has been implemented in captum.
  • The visualization of attribution function is mainly plotting function. It should not be that big of a deal there might be some optimisation to sugges as of now I have one plotting function per figure I wanna create and therefore there is a lot of redundancy.
  • The Mask creation and DAC score should be looked as it is how the dac curve and mask are created so we want to avoid error here.The test of above code is whenever I test the functions. There should not be much things to check.

@afermg
Copy link
Collaborator Author

afermg commented Dec 4, 2024

I fixed the easily fixable issues with ruff (ruff check --fix). These are the non-fixable. Do make sure of fix these on this branch, then you can just copy the files recursively to the main one (these two branches have an initial empty commit and are thus unrelated to your main branch). We can also cherry pick these last commit (f7b52c8) if that is easier.

[nix-shell:~/projects/jump_attribution/workspace/analysis]$ ruff check
error: Failed to parse gans_profiles_script.py:124:1: missing closing quote in string literal
Data_filtering/get_features.py:37:20: E712 Avoid inequality comparisons to `True`; use `if not pl.col("Metadata_Source").str.contains("_9|_1$"):` for false checks
attribution.py:283:9: E722 Do not use bare `except`
attribution.py:1484:9: E722 Do not use bare `except`
attribution.py:1760:1: E402 Module level import not at top of file
attribution.py:1761:1: E402 Module level import not at top of file
attribution.py:1762:1: E402 Module level import not at top of file
attribution.py:1763:1: E402 Module level import not at top of file
attribution.py:1764:1: E402 Module level import not at top of file
filter_image.py:116:9: E722 Do not use bare `except`
gans_profiles_script.py:124:1: E999 SyntaxError: missing closing quote in string literal
lightning_parallel_training.py:893:12: E721 Do not compare types, use `isinstance()`
lightning_parallel_training.py:903:12: E721 Do not compare types, use `isinstance()`
lightning_parallel_training.py:911:12: E721 Do not compare types, use `isinstance()`
parallel_training.py:54:12: E712 Avoid equality comparisons to `True`; use `if give_matrix:` for truth checks
parallel_training.py:111:12: E712 Avoid equality comparisons to `True`; use `if allow_eval:` for truth checks
Found 15 errors.
No fixes available (3 hidden fixes can be enabled with the `--unsafe-fixes` option).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is way too big. I have a few suggestions:

  • Isolate the plotting functions into their own file
  • Separate the testing section (starting at the second round of inputs) into tan independent file
  • I understand that you pulled and modified classes from captum. Please make it clear which classes/functions come from there by adding a permalink to their source
  • Add docstrings indicating what is the purpose of a given function for the ones you wrote. If they are too many, focus on the ones used on your tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a much better solution to import the code is to subclass and override the methods that you are modifying.

fig.savefig(fig_directory / fig_name, dpi=300, bbox_inches='tight')
plt.close()
"""
------- Test of above code ----------
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely be its own script! It will also allow you to reduce the dependency footprint at the top of the file


# # 1) Loading images and create a pytorch dataset

# ## a) Load Images using Jump_portrait
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to its own script? Or add it on the readme. This is data acquisition, so it is important.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all the commented lines or, if they actually are instructions on how to download files, make them documentation or their own file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot code review notebooks in this interface. Please use jupytext 'jupytext --to py:percent X.ipynb' and push that version for review

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot code review notebooks in this interface. Please use jupytext 'jupytext --to py:percent X.ipynb' and push that version for review

@@ -0,0 +1,2 @@
#+title: Readme
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add links to main resources, this will allow you (or anyone re-using this data) to quickly access the documentation/scripts/notes/references.

@@ -0,0 +1,38 @@
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest doing something like this to your python scripts. It gives the reader a quick overview of what the script is about.

@@ -0,0 +1,179 @@
# 1. Visualise image
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my bad: I usually write the step-by-step of a script before implementing it. Then I would replace it with text showing what the script does. I forgot to do the latter here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot code review notebooks in this interface. Please use jupytext 'jupytext --to py:percent X.ipynb' and push that version for review

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one may have been replaced by data_v2. I'd suggest removing the scripts that are not producing useful data to reduce the signal to noise ratio.

@afermg
Copy link
Collaborator Author

afermg commented Dec 4, 2024

According to my duplication analysis tool, these files are practically the same:

  "filter_image.py": {
       "gans_profiles_script.py": 83.46,
       "image_classifier_script.py": 92.64
   },

Consider refactoring them into one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants