Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include non-python files #8

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

susannasiebert
Copy link
Contributor

The previous change did not include non-python files like the model files and pseudoseqs.csv file correctly, leading to errors when trying to run the commands. I apologize for this oversight.

This PR also updates the various import statement to match the required syntax for packages installed on the system.

As a side note, the files currently get installed under /models, /src, /data subfolders under the system's python site-packages. This means that those folders are not specific to bigmhc and there could potentially be conflicts with other packages that might also install files into these subfolders. This could be fixed by restructuring the repository and moving all the existing files/folders under a /bigmhc umbrella folder. This would require some additional changes to the setup.py and the import statements. All the files would then end up being installed under a bigmhc folder under site-packages, which should resolve any conflicts since there shouldn't be any other packages with that name. I can take a stab at this if this is something desired.

@benjaminalbert
Copy link
Collaborator

benjaminalbert commented Feb 7, 2024

@susannasiebert thank you for this contribution! Would you please make the modification that you suggested, putting everything under a bigmhc umbrella folder? And would you please update the imports in the jupyter notebooks as appropriate? I will also roll back to commit 1f9ab07

@susannasiebert
Copy link
Contributor Author

@benjaminalbert will do. I put this PR into draft state and will move it back into "ready for review" state once I have completed the changes.

@susannasiebert
Copy link
Contributor Author

I believe this is ready for review but I was a bit confused about the directory structure in the jupyter notebooks. I'm not sure if all the references to the data directory need updating since it moved under the bigmhc umbrella. @benjaminalbert can you please give it a try on your end and check if it all works?

@susannasiebert susannasiebert marked this pull request as ready for review February 8, 2024 16:11
@benjaminalbert
Copy link
Collaborator

I believe this is ready for review but I was a bit confused about the directory structure in the jupyter notebooks. I'm not sure if all the references to the data directory need updating since it moved under the bigmhc umbrella. @benjaminalbert can you please give it a try on your end and check if it all works?

@susannasiebert Thanks for the fixes. Would you please move the nb dir under the bigmhc umbrella dir? The notebooks rely on the local dir structure.

@susannasiebert
Copy link
Contributor Author

I don't think this is a good idea as it would include all these jupyter files in the bigmhc package that the setup.py creates. Only files needed to run bighmhc should be included, especially since the package is already very large as is (over 2Gb due to the model files). Can we instead update the directories in the jupyter notebooks?

@benjaminalbert
Copy link
Collaborator

The jupyter notebooks are relatively small (< 200KB), and the associated PDB files total about 3MB, so I don't think size is an issue here given that the models are so much larger. However, an alternative could be to include an __init__.py file within the bigmhc umbrella dir, and then replace the datadir vars in the notebooks with:

import bigmhc
datadir = os.path.join(bigmhc.__path__, "data")

After modifying the imports in the notebooks, the sys path would not need to be updated, so this could be removed in attention.ipynb:

srcpath = os.path.abspath("../src")
if srcpath not in sys.path:
    sys.path.append(srcpath)

The installation instructions could first direct users to git clone, then install:

git clone https://github.com/karchinlab/bigmhc.git
pip install ./bigmhc

This would give easy access to the notebooks while still allowing the option of installing directly via pip install git+https://github.com/karchinlab/bigmhc.git.

Would this work for integration with pVACtools?

@susannasiebert
Copy link
Contributor Author

I like your second suggestion and that should work for pVACtools as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants