Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] T4 Lysozyme input files #66

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

[WIP] T4 Lysozyme input files #66

wants to merge 2 commits into from

Conversation

andrrizzi
Copy link
Collaborator

@mkgilson I've uploaded in this branch the receptors in PDB format and ligands in mol2 format.

The receptors are clean (no crystal waters) and they went through MCCE to obtain the most relevant protonation states at the experimental pH. The ligands have been generated from the SMILES and docked into the receptors with OpenEye. The protonation state is the one Epik predicted. Each folder has a JSON file describing each ligand with associated reference and experimental conditions/measurements.

I haven't had the chance to put together a README and to organize the prmtop/inpcrd yet, but as far as I understood, the PDB files are the priority at this stage. I'll try to get the rest in by the end of the week.

@andrrizzi
Copy link
Collaborator Author

I plan to upload the solvated systems soon. A few questions:

  1. Do we want only the prmtop/inpcrd files here or the PDB files as well?
  2. The set I have prepared includes 14 L99A binders that are not listed in the paper, which brings the total number of L99A binders to 21. Is there a particular reason why the current set has a relatively low number of ligands? I guess what I'm asking is: should I not upload these extra binders or should we update the table in the paper? Also, I haven't prepared the non-binders for now.
  3. Uploading these files for all 29 systems will add ~370MB (~460MB if I add the PDB files as well) to the existing ~387MB. We should probably start coming up with a long-term plan for file storage as the repo is getting closer to the 1GB limit.

@davidlmobley
Copy link
Member

Thanks, @andrrizzi .

Do we want only the prmtop/inpcrd files here or the PDB files as well?

I think it'd be good to also have the PDBs since these are very helpful for some codes and also could assist if it is necessary to prepare other formats.

The set I have prepared includes 14 L99A binders that are not listed in the paper, which brings the total number of L99A binders to 21. Is there a particular reason why the current set has a relatively low number of ligands? I guess what I'm asking is: should I not upload these extra binders or should we update the table in the paper? Also, I haven't prepared the non-binders for now.

The short answer is: Please deposit them, and update the supplemental info to also list details of these. (See also work done on cyclodextrins and some of the other sets). We can update the paper to note this is available.

The main reason the current set has a low number of ligands is that we were, in this work, primarily focused on issues roughly like those addressed in the SAMPLing challenge -- assessing methods to ensure whether and how well they converge to known gold standard answers (when available) or at least how well they deal with the issues which are known to be important -- and less on accuracy relative to experiment. If you're interested in method efficiency separated from force field issues, and thus not in comparing to experiment, the number of ligands is less of a factor (as in the case of the SAMPLing challenge).

Eventually I need to update the paper to make this point and the terminology slightly more clear on this as John was confused on this too, but I haven't had time.

@davidlmobley
Copy link
Member

Oh, and also @andrrizzi

Uploading these files for all 29 systems will add ~370MB (~460MB if I add the PDB files as well) to the existing ~387MB. We should probably start coming up with a long-term plan for file storage as the repo is getting closer to the 1GB limit.

Good point. Any suggestions? One approach would be for me to make a separate data package in eScholarship which would get its own DOI and we could link to that from here. That's not under automatic version control (though I can upload versioned .tar.gz files), but it's free and the UC libraries maintain it so it's safe for the long haul. There are probably also other options but presumably they would be paid.

@andrrizzi
Copy link
Collaborator Author

Sounds good, thanks!

Any suggestions?

Nothing specific comes to mind right now. I agree that we want to have versioned input files. If we want to keep things free, an alternative to manual versioning could be to split this into multiple github repos (e.g. benchmarksets/t4lysozyme, benchmarksets/hostguest), although I'm not sure which one of the two would be less cumbersome. Otherwise, I think adding 50GB to storage and bandwidth on GitHub costs $5/month (see here).

@davidlmobley
Copy link
Member

@andrrizzi - we are good to go now, I've added the 50 GB so you should be able to add those files.

@andrrizzi
Copy link
Collaborator Author

Awesome! Thank you!

@andrrizzi
Copy link
Collaborator Author

Still missing the README/paper update, and the layout may need a little polishing but the files are here. I'll try to wrap up this PR next week.

@davidlmobley
Copy link
Member

@andrrizzi - any updates?

@andrrizzi
Copy link
Collaborator Author

Sorry, I got caught up in other work. I'll try to find some time next week to finish this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants