Merge pull request #8 from matteoferla/dev-teo

🚚 merging branch
matteoferla · Feb 13, 2021 · 28e18c3 · 28e18c3
2 parents 8c080de + 10dc8e1
commit 28e18c3
Show file tree

Hide file tree

Showing 263 changed files with 16,162 additions and 4,749 deletions.
diff --git a/README.md b/README.md
@@ -3,13 +3,71 @@ Scaffold hopping between bound compounds by stitching them together like a reani
 <img src="images/fragmenstein.jpg" width="300px">
 
 
-## Aim
-Given a followup molecule (SMILES) and a series of hits it makes a spatially stitched together version of the followup based on the hits.
+## Stitched molecules
+
+Fragmenstein can perform two different tasks.
+
+* **Combine** hits
+* **Place** a given followup molecule (SMILES) based on series of hits
+
 Like Frankenstein's creation it may violate the laws of chemistry.
 Planar trigonal topologies may be tetrahedral, bonds unnaturally long _etc._
-This monstrosity is therefore then energy minimised with strong constraints.
+This monstrosity is therefore then energy minimised with strong constraints within the protein.
+
+## Classes
+
+There are three main classes, named after characters from the Fragmenstein book and movies:
+
+* ``Monster`` makes the stitched together molecules indepent of the protein — [documentation](documentation/monster/monster.md)
+* ``Igor`` uses PyRosetta to minimise in the protein the fragmenstein followup — [documentation](documentation/igor.md)
+* ``Victor`` is a pipeline that calls the parts, with several features, such as warhead switching —[documentation](documentation/victor.md)
+
+NB. In the absence of `pyrosetta` (which requires an academic licence), all bar ``Igor`` work.
+
+Additionally, there are a few minor classes.
+
+One of these is ``mRMSD``, a multiple RMSD variant which does not align and bases which atoms 
+to use on coordinates —[documentation](documentation/mrmsd.md)
 
-Here is [an interactive example of mapped molecules](https://michelanglo.sgc.ox.ac.uk/r/fragmenstein).
+There are two module hosted elsewhere:
+
+* ``Rectifier`` from [molecular_rectifier](https://github.com/matteoferla/molecular_rectifier) is a class that corrects mistakes in the molecule automatically merged by ``Monster``.
+* ``Params`` from [rdkit to params module](https://github.com/matteoferla/rdkit_to_params) parameterises the ligands
+
+### Combine
+It can also merge and link fragment hits by itself and find the best scoring mergers.
+For details about linking see [linking notes](documentation/linking.md)
+It uses the same overlapping position clustering, but also has a decent amount of impossible/uncommon chemistry prevention.
+
+Monster:
+
+    from fragmenstein import Monster
+    monster = Monster(hits=[hits_a, hit_b])
+    monster.combine()
+    monster.positioned_mol # RDKit.Chem.Mol
+
+Victor:
+
+    from fragmenstein import Monster
+    victor = Victor(hits=[hits_a, hit_b], 
+                    pdb_filename='foo.pdb',
+                    covalent_resi=1) # if not covalent, just put the first residue or something.
+    victor.combine()
+    victor.minimised_mol
+
+The two seem similar, but Victor places with Monster and minimises with Igor.
+As a result it has energy scores
+
+    victor.ddG
+
+Fragmenstein is not really a docking algorithm as it does not find the pose with the **lowest energy** 
+within a given volume.
+Consequently, it is a method to find how **faithful** is a given followup to the hits provided.
+Hence the minimised pose should be assessed by the RMSD metric or similar
+and the ∆∆G score used solely as a cutoff —lower than zero.    
+
+## Place
+Here is [an interactive example of placed molecules](https://michelanglo.sgc.ox.ac.uk/r/fragmenstein).
 
 It is rather tolerant to erroneous/excessive submissions (by automatically excluding them)
 and can energy minimise strained conformations.
@@ -21,37 +79,45 @@ For example, note here that the benzene and the pyridine rings overlap, not the
 
 <img src="images/position_over_mcs.jpg" width="300px">
 
-### Side role: follow prediction
-It can also merge fragment hits by itself and find the best scoring mergers.
-It uses the same overlapping position clustering, but also has a decent amount of impossible/uncommon chemistry prevention.
+### Examples
 
-## Not-docking
-As a consequence, it is not really a docking algorithm as it does not find the pose with the lowest energy 
-within a given volume. Consequently, it is a method to find how faithful is a given followup to the hits provided.
-Hence the minimised pose should be assessed by the RMSD metric and the ∆∆G score used solely as a cutoff —lower than zero.
+Monster:
+
+    from fragmenstein import Monster
+    monster = Monster(hits=[hits_a, hit_b])
+    monster.place_smiles('CCO')
+    monster.positioned_mol
+
+Victor:
 
-## Dramatis personae
+    from fragmenstein import Monster
+    victor = Victor(hits=[hits_a, hit_b], pdb_filename='foo.pdb')
+    victor.place('CCO')
+    victor.minimised_mol
 
-> Victor, the pipeline, requires my [rdkit to params module](https://github.com/matteoferla/rdkit_to_params).
+## Other features
 
-There are three main classes, named after characters from the Fragmenstein book and movies:
+* [Covalent hits](documentation/covalents.md)
+* [Logging](documentation/logging_and_debugging.md)
 
-* ``Fragmenstein`` makes the stitched together molecules — [documentation](documentation/fragmenstein.md)
-* ``Igor`` uses PyRosetta to minimise in the protein the fragmenstein followup — [documentation](documentation/igor.md)
-* ``Victor`` is a pipeline that calls the parts, with several features, such as warhead switching —[documentation](documentation/victor.md)
+## Installation
 
-An honourable mention goes to:
+Requires RDKit
 
-* ``mRMSD`` is a multiple RMSD variant which does not align and bases which atoms to use on coordinates —[documentation](documentation/mrmsd.md)
-* ``rectifier`` is a class that corrects mistakes in the molecule automatically merged by ``Fragmenstein``.
+    sudo apt-get install python3-rdkit librdkit1 rdkit-data
 
-In the absence of `pyrosetta` (which requires an academic licence), all bar ``Igor`` work.
+Requires Pyrosetta
+
+    curl -u 👾👾👾:👾👾👾https://graylab.jhu.edu/download/PyRosetta4/archive/release/PyRosetta4.Release.python38.linux/PyRosetta4.Release.python38.linux.release-273.tar.bz2 -o a.tar.bz2
+    tar -xf a.tar.bz2
+    cd PyRosetta4.Release.python38.linux
+    sudo pip3 install .
 
-## Work in progress
+Install from pipy
 
-Some changes to the algorithm may happen, see [wip.md](documentation/wip.md) for more or drop me (matteo) an email.
+    sudo pip3 install fragmenstein
 
-## Laboratory
+## Origin
 
 > See [Fragmenstein and COVID moonshot](documentation/covid.md).
 
@@ -65,3 +131,9 @@ are not encountered in other projects.
 
 For more see the source code or the [Sphinx converted documentation](documentation/sphinx-docs.md).
 
+## Changes
+
+Some changes to the algorithm may happen, 
+see [changelog](documentation/changelog_0.6.md) and [wip.md](documentation/wip.md) for more.
+
+
diff --git a/documentation/_snippets.md b/documentation/_snippets.md
@@ -0,0 +1,58 @@
+These are snippets that need to go somewhere
+# 1
+
+To make Victor carp at any exception, do
+
+    Victor.error_to_catch = ()
+
+This is because `try`/`except` can accept multiple error classes as a tuple.
+
+# 2
+
+`ConnectionError` is meant for comms connection errors, but here it is raised as bad bonding.
+
+# 3
+
+Show the steps
+
+    from IPython.display import display
+
+    monster = Monster([fore, aft]).merge()
+    for m in monster.modifications:
+        display(m)
+        
+Show multiple RDKit mols in nglview
+
+    import nglview as nv
+    from io import StringIO
+
+    def show_mols(*mols: Chem.Mol) -> nv.NGLWidget:
+        view = nv.NGLWidget()
+        for mol in mols:
+            fh = StringIO(Chem.MolToPDBBlock(mol))
+            view.add_component(fh, ext='pdb')
+        return view
+
+# X
+
+One issue is that the ring mergers are not transitive.
+Hit order results in different cases:
+
+    ortho = Chem.MolFromSmiles('Cc1cc(O)ccc1')
+    meta = Chem.MolFromSmiles('Cc1ccc(O)cc1')
+    AllChem.EmbedMolecule(ortho)
+    AllChem.EmbedMolecule(meta)
+    Chem.rdMolAlign.AlignMol(prbMol=ortho,
+                             refMol=meta,
+                             atomMap=[(0,0), (1,1)],
+                             weights=[1,1],
+                            )
+    show_mols(ortho, meta)
+
+These two cresol rings are perpendicular. The merger produces the same compound, methylcatecol
+but one has a more distorted conformation.
+
+    Monster([ortho, meta]).merge().positioned_mol  # decent
+    Monster([meta, ortho]).merge().positioned_mol  # indecent
+
+What are the implications for this?
diff --git a/documentation/changelog_0.6.md b/documentation/changelog_0.6.md
@@ -0,0 +1,75 @@
+## Monster entrypoints
+
+Previously merging in `Monster` was overly complicated and an add-on, now it is cleaner as the initialisation of 
+the object does not do a placement. To do that one has to do:
+
+    monster = Monster(hits)
+    monster.place(mol)
+    monster.positioned_mol
+
+or `place_smiles` for a SMILES.
+
+While to do a merger
+
+    monster = Monster(hits)
+    monster.combine()
+    monster.positioned_mol
+
+Additionally, the attribute `modifications` was added/expanded, so that intermediate steps are stored
+for potential inspection.
+It is a dictionary, but since 3.6 dict is ordered and `.keep_copy(mol, label)` prevents over-writes.
+This includes `scaffold` (the template) and `chimera` (the template with the atom symbols adapted), 
+which are no longer attributes.
+
+Equally viable alternatives are stored in a list
+
+    monster.mol_options
+
+`combine` still calls `simply_merge_hits`, which is used by placement too and merges by fragmentation
+—unlike atom-ring absorption events. The fact that two different approaches are present is just historical.
+
+If one wanted to merge two or more hits, independently of those in `.hits` attribute and without ring collapsing 
+and rectification etc.
+`simply_merge_hits` is still the method to use:
+
+    monster = Monster([])
+    monster.simply_merge_hits([molA, molB])
+
+The `place` method accepts the argument `merging_mode`, by default it is "permissive_none",
+which calls `.no_blending(broad=True)`,
+but "off" (nothing), 
+"full" (`.full_blending()`), 
+"partial" (`.partial_blending()`)
+and "none" (`.no_blending()`)
+are accepted.
+
+## Names
+The names "merge" and "place" were ultra-confusingly ambiguous when it comes to the other methods, especially those of "place".
+
+* Now all the cases where hits are partially combined for placement purposes are called "blending"
+The placement of the (distorted) 3D final monster compound Victor is called "plonk".
+
+* place and position are used synonymously
+* combine and merge are used synonymously
+
+
+## Future
+Victor however still has the merger as a side route.
+
+## Overlapping rings
+
+There was a bug in the code for `Monster` that meant that if two rings from different origins that were not bonded
+—i.e. without side atoms that were absorbed they were not assessed for merging. This has been corrected.
+
+Namely, what happens to rings is controlled in `expand_ring` method, which calls `_add_novel_bonding`, where "novel"
+means that does not share the same "origin" (original molecule). 
+The latter method calls in turn `_get_novel_ringcore_pairs`, which now get ring marking atoms that are close or bonded.
+Closeness is determined by ring atoms (not ring core markers) within 1.5 Å cutoff (viz. `_get_close_nonorigin_ringcores`)
+
+The test case was the merger of a phenylacetic acid and phenylacetamide which overlap in two ring carbons and a terminal oxygen, resulting
+in a phenylene-like ring (where one ring is an oxazine). See tests.
+
+![phenylene](../images/phenylene.png)
+
+
+
diff --git a/documentation/covalents.md b/documentation/covalents.md
@@ -0,0 +1,56 @@
+## Covalent
+
+Covalent forms of a molecule are marked within the `Chem.Mol` instances in fragmenstein with a "dummy atom".
+This is element symbol `*` within RDKit and smiles, but `R` in a mol file and PDB file.
+
+This is essential for the params file (topology file for the ligand).
+
+Consequently, hits are best extracted with `Victor.extract_mols` or `Victor.extract_mol`.
+Fragalysis does not do this, so the hits from there will be treated as regular compounds.
+
+## Monster
+
+A molecule with a single atom is passed to `attachment` argument of `Monster`,
+then the covalent linker if absent in the hits is anchored to that atom.
+If you know a ligand that is covalent `Victor.find_attachment` can be used to extract the attachment atom.
+
+## Warheads
+
+Victor has a cysteine reactive warheads operations
+In the class attribute `.warhead_definitions` are stored conversions and atom names
+(cf. MProVictor for an example).
+
+**acrylamide** `C(=O)C=C` => `C(=O)CC*`
+**chloroacetamide** `C(=O)C[Cl]` => `C(=O)C*`
+**nitrile** `C(#N)` => `C(=N)*`
+**vinylsulfonamide** `S(=O)(=O)C=C` => `S(=O)(=O)CC*`
+**bromoalkyne** `C#C[Br]` => `C(=C)*`
+
+Additionally, `.possible_definitions` contains two (or more) that may be added experimentally (don't currently work).
+
+**aurothiol** `S[Au]P(CC)(CC)CC` => `S[Au]*`
+**aldehyde** `[C:H1]=O` => `C(O)*`
+
+There is a quick way to get a warhead definition too
+
+    Victor.get_warhead_definition(warhead_name)
+
+In terms of Rosetta constraints (restraints), these can be added with 
+
+    Victor.add_constraint_to_warhead(name=constrain_name, constraint=constraint)
+
+For example:
+
+* _chloroacetamide_: `AtomPair  H  145A  OY  1B HARMONIC 2.1 0.2`
+* _nitrile_: `AtomPair  H  145A  NX  1B HARMONIC 2.1 0.2`
+* _acrylamide_: `AtomPair  H  143A  OZ  1B HARMONIC 2.1 0.2`
+* _vinylsulfonamide_: `AtomPair  H  143A  OZ1 1B HARMONIC 2.1 0.2`
+
+Currently, only cysteine details are known to `Victor`. Cf. `.covalent_definitions`.
+
+To convert a react_ive_ SMILES to a dummy-atom–marked react_ed_ SMILES:
+
+    Victor.make_all_warhead_combinations(smiles, warhead_name)
+
+### Untested backdoor
+The default dummy atom can be overridden with `Monster.dummy:Chem.Mol` and `Fragmenstein.dummy_symbol:str`.
diff --git a/documentation/covid.md b/documentation/covid.md
@@ -8,7 +8,7 @@ However, there are some problems:
 * the form does not have a `no inspiration hit` option, so many users submitted `x0072` the first as inspiration when submitting docked libraries.
 * the inspiration hits are common to a group of submissions by a user, even if one went into one and another to another.
 * some pockets have many hits, so a large amount of not fully overlapping hits are submitted
-* some users submitted a mispelt hit
+* some users submitted a mispelt hit code
 
 ## Custom class
 
@@ -28,7 +28,7 @@ For an example of the script used, see [covid.py](../covid.py).
 Note that this script runs on multiple cores. For a fews smiles, which takes about 30 seconds each there is no need.
 Also note that some molecules get stuck due to incorrectly entered inspirations.
 
-For a comparision of how the three method fair with the daset see [three modes compared](three_modes_compared.md).
+For a comparision of how the three method fair with the daset see [three modes compared](monster/three_modes_compared.md).
 
 ## Over-inspired problem
 

diff --git a/documentation/logging_and_debugging.md b/documentation/logging_and_debugging.md
@@ -38,3 +38,8 @@ at each step:
         for i, mod in enumerate(victor.modifications):
             pymol.cmd.read_molstr(Chem.MolToMolBlock(mod, kekulize=False), f'step{i}')
         pymol.cmd.save('test.pse')
+
+The `Monster` class can trip up, and this can happen at various steps.
+`victor.modifications` contains a stack of the various step, such as the collapsed rings to the rectification.
+
+