Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add tutorials for PPIs #434

Merged
merged 56 commits into from
Jul 6, 2023
Merged

docs: add tutorials for PPIs #434

merged 56 commits into from
Jul 6, 2023

Conversation

gcroci2
Copy link
Collaborator

@gcroci2 gcroci2 commented May 30, 2023

With this PR, I add two tutorial notebooks for PPI data;

  • data_generation_ppi.ipynb
    • residue level data generation
    • atomic level data generation
  • training_ppi.ipynb
    • classification for GNNs using GraphDatasets
    • classification for CNNs using GridDatasets
  • references to regression across the notebook

@gcroci2 gcroci2 linked an issue May 30, 2023 that may be closed by this pull request
5 tasks
@gcroci2 gcroci2 requested a review from DaniBodor June 2, 2023 15:43
Copy link
Collaborator

@DaniBodor DaniBodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great tutorials!
I think the main comment I have is that I would split up the zotero into two separate repo's. One for pdb data and another for hdf5 data. People going through data generation don't need to download 2GB, especially because almost all of it will be deleted in the 1st step anyway.

The HDF5 repo is the only for people that don't want to go through the process of data generation (actually, that one can just be the current one, with all data because the pdbs don't add much).

Apart from that, these are mostly some minor textual suggestions and in a few places make some of the settings more explicit/visible so that users see how/where to find them.

tutorials/data_generation_ppi.ipynb Show resolved Hide resolved
tutorials/data_generation_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/data_generation_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/data_generation_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/data_generation_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/training_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/training_ppi.ipynb Show resolved Hide resolved
tutorials/training_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/training_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/training_ppi.ipynb Show resolved Hide resolved
@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jun 14, 2023

Really great tutorials! I think the main comment I have is that I would split up the zotero into two separate repo's. One for pdb data and another for hdf5 data. People going through data generation don't need to download 2GB, especially because almost all of it will be deleted in the 1st step anyway.

The HDF5 repo is the only for people that don't want to go through the process of data generation (actually, that one can just be the current one, with all data because the pdbs don't add much).

Apart from that, these are mostly some minor textual suggestions and in a few places make some of the settings more explicit/visible so that users see how/where to find them.

  • I edited the data on Zenodo for them to be in two different folders, one for each tutorial (data_raw and data_processed, respectively).
  • I edited the setup - software in both tutorials. I tested a new environment using the tutorials extras_require version of the package, and everything works on my machine.
  • I removed the hdf5explorer dependency which was causing many issues on my Mac installation (with pyqt5 in particular, even if the last time I made it work with the links indicated in the readme; apparently it doesn't work anymore). We're not really using it and at the bottom of the readme there this package is mentioned and there is a link to its documentation, which should include the installation as well. Also, I remember that we were thinking to remove hdf5explorer, but we decided to leave it at the time. Anyway nobody is using it.
  • I implemented the vast majority of your suggestions, I feel it is much more precise now! Thanks for the precious feedback <3 @DaniBodor

Just to be sure, if you can give a fast look to the new edits, then I can merge :)

@gcroci2 gcroci2 requested a review from DaniBodor June 14, 2023 07:10
@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jun 20, 2023

If you'll review this PR again when I am on holiday, feel free to merge it if there is nothing else to add @DaniBodor

@DaniBodor DaniBodor mentioned this pull request Jun 23, 2023
6 tasks
Copy link
Collaborator

@DaniBodor DaniBodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍

I fixed a few remaining typos/language issues in the notebooks, apart from that it looks good!

I agree with your point about publishing it with a finished run, so people could choose to gloss over it instead of running the notebook. If we do it that way, I think we should state that they can either gloss or clear the notebook and run it themselves. Shall I add a comment about that and run the notebooks, or do you prefer to leave them blank? Or do you think this will prompt too many users to just read and not try it out themselves?

Also, I noticed that you left code snippets of the neural networks inside the md cells. I still think it's quite distracting from the flow to have such a large code block in the middle of a cell for something that is non-essential to understand at a basic level how to run the software. Did you have a particular reason for leaving it or did you miss my previous comment on this?

Regarding Zenodo. Maybe a nice idea to already add a .gitignore file with * in each folder, just to make extra sure that no data inside is tracked. (this is especially useful in case people save the data in a different location than the recommended one). Also, remove the Mac leftovers (DS_Store thing).

Finally, should there be some explanation of how to use notebooks at all (or at least a link explaining it)? Especially for users who don't use notebooks integrated in their IDE, it would be nice to be able to run it from jupyter-lab (like we do for the workshops). Maybe there could be a README for the notebooks specifically, with the generic info like intro and install instructions that is currently on top of both notebooks (then it doesn't need to be repeated), as well as how to use it from IDE or jupyter-lab. I did a draft of this in PR #455 , see if you like it or not.

I made some changes in (mainly to README) in #452 that are pointing here, because I branched off from here to avoid conflicts in the future. That one can either be merged to this first, or can be merged to main after this goes through.

setup.cfg Show resolved Hide resolved
@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jul 4, 2023

Give me a sign when you merge the other two PRs (#455 and #452), if there is nothing else to add I'll merge this then :) @DaniBodor

@DaniBodor
Copy link
Collaborator

It might make sense to wait for #446 to get closed before finalizing this, so we can include those changes (otherwise it can be updated in that PR).

@gcroci2
Copy link
Collaborator Author

gcroci2 commented Jul 6, 2023

It might make sense to wait for #446 to get closed before finalizing this, so we can include those changes (otherwise it can be updated in that PR).

I would update the notebooks in PR #446 directly, together with the rest of the docs

@gcroci2 gcroci2 merged commit 997181c into main Jul 6, 2023
@gcroci2 gcroci2 deleted the 432_add_tutorials_ppi_gcroci2 branch July 6, 2023 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tutorials for PPIs
2 participants