Adding 6 new benchmark systems #3

brycestx · 2018-08-08T21:30:26Z

This PR adding protein and ligand systems for 6 systems that come from JMedChem papers published recently. Readme and documentation is provided. Most of the work came from @jeremy2401 from an internship this summer.

davidlmobley · 2018-08-23T23:33:33Z

@brycestx @jeremy2401 - I see some issues that it would be nice to have sorted out here. Starting with the big picture:

There is nothing in the documentation here which actually says what these systems are and why they are good benchmarks. It would be ideal to instead (a) briefly mention them in the MAIN README.md, and (b) describe in detail in the README.md associated with the clean_dataset directory. Right now the only things listed there are things like "EZH2", "IMPDH", etc. I'd have to look those up to know what they are.
Why are these systems provided? There should be something that discusses that -- what's known about them, what are the challenges, the range of affinities, or why were they selected out of all posssibilities?
Right now VERY little is said about how the systems were prepared
How many compounds are provided? For what targets?
Have these been used in prior computational studies/are they being used?

Minor quibbles:

Current documentation is very odd, e.g. " SMILES of cmp_14 given by the csv was wrong and have been corrected". What CSV? And is, say, cmp_14 from a paper? Or from an internal datafile? There are many other comments that are hard to understand, like " carefull to inhibition mechanism (uncomp)" (???)
"cmp with stereo have been removed" Why? I think I can guess, but should be stated
Some text seems missing, e.g. this phrase appears: "Duplicates, wrong SMILES strings and compounds with no specified stereochemistry have been removed or corrected from The SMILES codes given by the supplementary data and conformers have been generated." Is there a clause missing after "corrected from"?
The markdown format headers aren't spaced properly so they appear odd, see https://github.com/brycestx/soft-benchmarks/tree/master/clean_dataset
It would be helpful to say HOW things were done otherwise it's impossible to reproduce, e.g. "The SMILES codes given by the supplementary data and conformers have been generated." With what tools?

Big picture: Make it so people can tell what you're adding, why you're adding it, and why it might be interesting to them without having to read all of the papers you're depositing.

(Also take note you probably don't have the rights to deposit the papers, only link to them...)

davidlmobley

Changes requested, see comments.

davidlmobley · 2018-08-23T23:34:47Z

Other things that would be worth knowing:

Are the PDBs simulation-ready (e.g. do they have protons? Missing residues? Gaps? Caps?)
How were they prepared/pre-processed? And with what tools?
Do the ligands have 3D geometries and are they posed? With all protons? How were protonation states/tautomers set/selected?

davidlmobley · 2019-07-30T17:35:09Z

@brycestx any chance of follow-up on this?

brycestx · 2019-07-30T17:43:45Z

@davidlmobley I never received any response from @jeremy2401 in regards to your comments. Let me assess bandwidth internally and figure out if we can reproduce and redocument from scratch Jeremy's work.

jeremy2401 and others added 4 commits July 23, 2018 17:58

clean_dataset, from silicontx

dc33da2

hot fix readme

81c1eb8

hot fix

eb7281e

Removing PPAT target because of ambiguities and updating readme for PR

33ba2aa

brycestx requested a review from davidlmobley August 8, 2018 21:31

davidlmobley requested changes Aug 23, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding 6 new benchmark systems #3

Adding 6 new benchmark systems #3

brycestx commented Aug 8, 2018

davidlmobley commented Aug 23, 2018

davidlmobley left a comment

davidlmobley commented Aug 23, 2018

davidlmobley commented Jul 30, 2019

brycestx commented Jul 30, 2019

Adding 6 new benchmark systems #3

Are you sure you want to change the base?

Adding 6 new benchmark systems #3

Conversation

brycestx commented Aug 8, 2018

davidlmobley commented Aug 23, 2018

davidlmobley left a comment

Choose a reason for hiding this comment

davidlmobley commented Aug 23, 2018

davidlmobley commented Jul 30, 2019

brycestx commented Jul 30, 2019