Review #3

afermg · 2024-12-04T18:09:49Z

Ignore this Pull Request for now. I will use it to document my code review.

From slack:

I think the file in workspace > analysis > attribution.py may require some review as we re doing all the interpretation from there.
Let me know if you have questions, I admit I haven't fully document so don't hesitate !It is segmented in several part, with some """ ------ Title --------"""
The one Attribution method should be decently okay since I have mainly adapted code from the captum github.
You may eventually have a look but it might be tedious as you would eventually have to look through how attribution class has been implemented in captum.
The visualization of attribution function is mainly plotting function. It should not be that big of a deal there might be some optimisation to sugges as of now I have one plotting function per figure I wanna create and therefore there is a lot of redundancy.
The Mask creation and DAC score should be looked as it is how the dac curve and mask are created so we want to avoid error here.The test of above code is whenever I test the functions. There should not be much things to check.

afermg · 2024-12-04T18:22:45Z

I fixed the easily fixable issues with ruff (ruff check --fix). These are the non-fixable. Do make sure of fix these on this branch, then you can just copy the files recursively to the main one (these two branches have an initial empty commit and are thus unrelated to your main branch). We can also cherry pick these last commit (f7b52c8) if that is easier.

[nix-shell:~/projects/jump_attribution/workspace/analysis]$ ruff check
error: Failed to parse gans_profiles_script.py:124:1: missing closing quote in string literal
Data_filtering/get_features.py:37:20: E712 Avoid inequality comparisons to `True`; use `if not pl.col("Metadata_Source").str.contains("_9|_1$"):` for false checks
attribution.py:283:9: E722 Do not use bare `except`
attribution.py:1484:9: E722 Do not use bare `except`
attribution.py:1760:1: E402 Module level import not at top of file
attribution.py:1761:1: E402 Module level import not at top of file
attribution.py:1762:1: E402 Module level import not at top of file
attribution.py:1763:1: E402 Module level import not at top of file
attribution.py:1764:1: E402 Module level import not at top of file
filter_image.py:116:9: E722 Do not use bare `except`
gans_profiles_script.py:124:1: E999 SyntaxError: missing closing quote in string literal
lightning_parallel_training.py:893:12: E721 Do not compare types, use `isinstance()`
lightning_parallel_training.py:903:12: E721 Do not compare types, use `isinstance()`
lightning_parallel_training.py:911:12: E721 Do not compare types, use `isinstance()`
parallel_training.py:54:12: E712 Avoid equality comparisons to `True`; use `if give_matrix:` for truth checks
parallel_training.py:111:12: E712 Avoid equality comparisons to `True`; use `if allow_eval:` for truth checks
Found 15 errors.
No fixes available (3 hidden fixes can be enabled with the `--unsafe-fixes` option).

afermg · 2024-12-04T18:29:25Z

workspace/analysis/attribution.py

This file is way too big. I have a few suggestions:

Isolate the plotting functions into their own file

Separate the testing section (starting at the second round of inputs) into tan independent file

I understand that you pulled and modified classes from captum. Please make it clear which classes/functions come from there by adding a permalink to their source

Add docstrings indicating what is the purpose of a given function for the ones you wrote. If they are too many, focus on the ones used on your tests

I think a much better solution to import the code is to subclass and override the methods that you are modifying.

afermg · 2024-12-04T18:32:47Z

workspace/analysis/attribution.py

+    fig.savefig(fig_directory / fig_name, dpi=300, bbox_inches='tight')
+    plt.close()
+"""
+------- Test of above code  ----------


This should definitely be its own script! It will also allow you to reduce the dependency footprint at the top of the file

afermg · 2024-12-04T18:34:58Z

workspace/analysis/image_classifier_script.py

+
+# # 1) Loading images and create a pytorch dataset
+
+# ## a) Load Images using Jump_portrait


Move this to its own script? Or add it on the readme. This is data acquisition, so it is important.

afermg · 2024-12-04T18:35:49Z

workspace/analysis/image_classifier_script.py

Remove all the commented lines or, if they actually are instructions on how to download files, make them documentation or their own file.

afermg · 2024-12-04T18:37:02Z

workspace/analysis/mAP_calculations.ipynb

I cannot code review notebooks in this interface. Please use jupytext 'jupytext --to py:percent X.ipynb' and push that version for review

afermg · 2024-12-04T18:37:10Z

workspace/analysis/tutorial_get_features.ipynb

I cannot code review notebooks in this interface. Please use jupytext 'jupytext --to py:percent X.ipynb' and push that version for review

afermg · 2024-12-04T18:38:48Z

readme.org

@@ -0,0 +1,2 @@
+#+title: Readme


Add links to main resources, this will allow you (or anyone re-using this data) to quickly access the documentation/scripts/notes/references.

afermg · 2024-12-04T18:39:30Z

workspace/analysis/2024_11_30_csv_to_parquet.py

@@ -0,0 +1,38 @@
+"""


I'd suggest doing something like this to your python scripts. It gives the reader a quick overview of what the script is about.

afermg · 2024-12-04T18:40:33Z

workspace/analysis/2024_11_30_marimo_explorer.py

@@ -0,0 +1,179 @@
+# 1. Visualise image


This is my bad: I usually write the step-by-step of a script before implementing it. Then I would replace it with text showing what the script does. I forgot to do the latter here.

afermg · 2024-12-04T18:40:40Z

workspace/analysis/Data_filtering/Target2_active.ipynb

I cannot code review notebooks in this interface. Please use jupytext 'jupytext --to py:percent X.ipynb' and push that version for review

This one may have been replaced by data_v2. I'd suggest removing the scripts that are not producing useful data to reduce the signal to noise ratio.

afermg · 2024-12-04T23:40:21Z

According to my duplication analysis tool, these files are practically the same:

  "filter_image.py": {
       "gans_profiles_script.py": 83.46,
       "image_classifier_script.py": 92.64
   },

Consider refactoring them into one

afermg and others added 30 commits December 4, 2024 13:05

first commit

d798287

new: add bib and html

5cb2817

fix(onboarding): citations work; update html

8af22ec

change(bib): update bibliography using hardlink

16a877c

change(onboarding): delegate setup to monorepo

c60c2d3

add ref

18ee8ef

use local-bib

94378a5

change bib file

b18bdf2

Adding image.png

844d497

Add plan description

522d3c5

extracting a sub dataset

b2b7458

push code

b26465d

push code again

371750e

Add first classifier pipeline on features

d3a9d2b

update flake

effe645

Add splitter and fixing environment

41b4563

Add custom splitter

115f8e5

Updating flake and adding custom split

14cbcf2

Custom split consistent over kernel and server

f36270b

Flake, req solved. classifier done, portrait start

53d51af

Add image classifier, update flake torch

0ce27fe

MultiGPU training fixed

652dc25

Jamboree day - moa distribution

c5028b3

Add receptive field calculation

4a3a91d

Including loss save in training loop

1323aff

Evaluation loop multi GPU

191c11a

Add visualisation of result from training

c1b17df

start working with lightning

ea1a75d

update requirement.txt with lightning

091493b

Update lightning script

846ea02

HugoHakem and others added 23 commits December 4, 2024 13:08

Adding option to choose baseline for attribution

21c01c6

Update smallest mask

b8822c6

Update custom_dataset & start filter image

4429abd

Debug custom_dataset

be4072d

Debut custom_dataset

f5a6e04

feat(filter_image): plot mapping moa_id to pert

ea92805

feat(flake): add isort

4429fc9

refactor(filter_image): ordering code

064b5c0

feat(filter_image): add channel to rgb and crop

0094998

fix(filter_image): update rgb normalisation

3dbf903

feat(filter_image): otsu and blob detector filter

a62d34f

refactor(filter_img): clarity and update filter

149a8ff

Refactor(filter_image): Update code

f9cd2d6

feat(conv_model): add max channel to VGG_ch

50adf5c

refactor: debugging, running pipeline, new ideas

0fe1d4e

update: refactor code and start interactive viz

167c89c

feat(image_classifier): add enbedding generator

d1ebfdc

Update: file permission executable

06b66ff

deps: update polars; add altair

120288f

feat(viz): add csv->parquet; add marimo viz

9a0c15e

Fix(csv_to_parquet): use PCA and UMAP on embedding

135794b

actions: add duplicate-code-detection

f7b52c8

chore: run ruff autofixes

c807a70

afermg added 4 commits December 4, 2024 17:52

fix(gans_profiles_script): match triple quoting

d13fc49

fix(custom_dataset): use valid unpacking

bc0c918

deps: have nix provide pyright and ruff

8397d8c

actions: fix root dir

04dc936

afermg commented Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review #3

Review #3

afermg commented Dec 4, 2024 •

edited

Loading

afermg commented Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 6, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg Dec 4, 2024

afermg commented Dec 4, 2024


		# # 1) Loading images and create a pytorch dataset

		# ## a) Load Images using Jump_portrait

Review #3

Are you sure you want to change the base?

Review #3

Conversation

afermg commented Dec 4, 2024 • edited Loading

afermg commented Dec 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afermg commented Dec 4, 2024

afermg commented Dec 4, 2024 •

edited

Loading