Skip to content

Commit

Permalink
Merge pull request #8 from matteoferla/dev-teo
Browse files Browse the repository at this point in the history
🚚 merging branch
  • Loading branch information
matteoferla authored Feb 13, 2021
2 parents 8c080de + 10dc8e1 commit 28e18c3
Show file tree
Hide file tree
Showing 263 changed files with 16,162 additions and 4,749 deletions.
120 changes: 96 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,71 @@ Scaffold hopping between bound compounds by stitching them together like a reani
<img src="images/fragmenstein.jpg" width="300px">


## Aim
Given a followup molecule (SMILES) and a series of hits it makes a spatially stitched together version of the followup based on the hits.
## Stitched molecules

Fragmenstein can perform two different tasks.

* **Combine** hits
* **Place** a given followup molecule (SMILES) based on series of hits

Like Frankenstein's creation it may violate the laws of chemistry.
Planar trigonal topologies may be tetrahedral, bonds unnaturally long _etc._
This monstrosity is therefore then energy minimised with strong constraints.
This monstrosity is therefore then energy minimised with strong constraints within the protein.

## Classes

There are three main classes, named after characters from the Fragmenstein book and movies:

* ``Monster`` makes the stitched together molecules indepent of the protein — [documentation](documentation/monster/monster.md)
* ``Igor`` uses PyRosetta to minimise in the protein the fragmenstein followup — [documentation](documentation/igor.md)
* ``Victor`` is a pipeline that calls the parts, with several features, such as warhead switching —[documentation](documentation/victor.md)

NB. In the absence of `pyrosetta` (which requires an academic licence), all bar ``Igor`` work.

Additionally, there are a few minor classes.

One of these is ``mRMSD``, a multiple RMSD variant which does not align and bases which atoms
to use on coordinates —[documentation](documentation/mrmsd.md)

Here is [an interactive example of mapped molecules](https://michelanglo.sgc.ox.ac.uk/r/fragmenstein).
There are two module hosted elsewhere:

* ``Rectifier`` from [molecular_rectifier](https://github.com/matteoferla/molecular_rectifier) is a class that corrects mistakes in the molecule automatically merged by ``Monster``.
* ``Params`` from [rdkit to params module](https://github.com/matteoferla/rdkit_to_params) parameterises the ligands

### Combine
It can also merge and link fragment hits by itself and find the best scoring mergers.
For details about linking see [linking notes](documentation/linking.md)
It uses the same overlapping position clustering, but also has a decent amount of impossible/uncommon chemistry prevention.

Monster:

from fragmenstein import Monster
monster = Monster(hits=[hits_a, hit_b])
monster.combine()
monster.positioned_mol # RDKit.Chem.Mol

Victor:

from fragmenstein import Monster
victor = Victor(hits=[hits_a, hit_b],
pdb_filename='foo.pdb',
covalent_resi=1) # if not covalent, just put the first residue or something.
victor.combine()
victor.minimised_mol

The two seem similar, but Victor places with Monster and minimises with Igor.
As a result it has energy scores

victor.ddG

Fragmenstein is not really a docking algorithm as it does not find the pose with the **lowest energy**
within a given volume.
Consequently, it is a method to find how **faithful** is a given followup to the hits provided.
Hence the minimised pose should be assessed by the RMSD metric or similar
and the ∆∆G score used solely as a cutoff —lower than zero.

## Place
Here is [an interactive example of placed molecules](https://michelanglo.sgc.ox.ac.uk/r/fragmenstein).

It is rather tolerant to erroneous/excessive submissions (by automatically excluding them)
and can energy minimise strained conformations.
Expand All @@ -21,37 +79,45 @@ For example, note here that the benzene and the pyridine rings overlap, not the

<img src="images/position_over_mcs.jpg" width="300px">

### Side role: follow prediction
It can also merge fragment hits by itself and find the best scoring mergers.
It uses the same overlapping position clustering, but also has a decent amount of impossible/uncommon chemistry prevention.
### Examples

## Not-docking
As a consequence, it is not really a docking algorithm as it does not find the pose with the lowest energy
within a given volume. Consequently, it is a method to find how faithful is a given followup to the hits provided.
Hence the minimised pose should be assessed by the RMSD metric and the ∆∆G score used solely as a cutoff —lower than zero.
Monster:

from fragmenstein import Monster
monster = Monster(hits=[hits_a, hit_b])
monster.place_smiles('CCO')
monster.positioned_mol

Victor:

## Dramatis personae
from fragmenstein import Monster
victor = Victor(hits=[hits_a, hit_b], pdb_filename='foo.pdb')
victor.place('CCO')
victor.minimised_mol

> Victor, the pipeline, requires my [rdkit to params module](https://github.com/matteoferla/rdkit_to_params).
## Other features

There are three main classes, named after characters from the Fragmenstein book and movies:
* [Covalent hits](documentation/covalents.md)
* [Logging](documentation/logging_and_debugging.md)

* ``Fragmenstein`` makes the stitched together molecules — [documentation](documentation/fragmenstein.md)
* ``Igor`` uses PyRosetta to minimise in the protein the fragmenstein followup — [documentation](documentation/igor.md)
* ``Victor`` is a pipeline that calls the parts, with several features, such as warhead switching —[documentation](documentation/victor.md)
## Installation

An honourable mention goes to:
Requires RDKit

* ``mRMSD`` is a multiple RMSD variant which does not align and bases which atoms to use on coordinates —[documentation](documentation/mrmsd.md)
* ``rectifier`` is a class that corrects mistakes in the molecule automatically merged by ``Fragmenstein``.
sudo apt-get install python3-rdkit librdkit1 rdkit-data

In the absence of `pyrosetta` (which requires an academic licence), all bar ``Igor`` work.
Requires Pyrosetta

curl -u 👾👾👾:👾👾👾https://graylab.jhu.edu/download/PyRosetta4/archive/release/PyRosetta4.Release.python38.linux/PyRosetta4.Release.python38.linux.release-273.tar.bz2 -o a.tar.bz2
tar -xf a.tar.bz2
cd PyRosetta4.Release.python38.linux
sudo pip3 install .

## Work in progress
Install from pipy

Some changes to the algorithm may happen, see [wip.md](documentation/wip.md) for more or drop me (matteo) an email.
sudo pip3 install fragmenstein

## Laboratory
## Origin

> See [Fragmenstein and COVID moonshot](documentation/covid.md).
Expand All @@ -65,3 +131,9 @@ are not encountered in other projects.

For more see the source code or the [Sphinx converted documentation](documentation/sphinx-docs.md).

## Changes

Some changes to the algorithm may happen,
see [changelog](documentation/changelog_0.6.md) and [wip.md](documentation/wip.md) for more.


58 changes: 58 additions & 0 deletions documentation/_snippets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
These are snippets that need to go somewhere
# 1

To make Victor carp at any exception, do

Victor.error_to_catch = ()

This is because `try`/`except` can accept multiple error classes as a tuple.

# 2

`ConnectionError` is meant for comms connection errors, but here it is raised as bad bonding.

# 3

Show the steps

from IPython.display import display

monster = Monster([fore, aft]).merge()
for m in monster.modifications:
display(m)
Show multiple RDKit mols in nglview

import nglview as nv
from io import StringIO

def show_mols(*mols: Chem.Mol) -> nv.NGLWidget:
view = nv.NGLWidget()
for mol in mols:
fh = StringIO(Chem.MolToPDBBlock(mol))
view.add_component(fh, ext='pdb')
return view

# X

One issue is that the ring mergers are not transitive.
Hit order results in different cases:

ortho = Chem.MolFromSmiles('Cc1cc(O)ccc1')
meta = Chem.MolFromSmiles('Cc1ccc(O)cc1')
AllChem.EmbedMolecule(ortho)
AllChem.EmbedMolecule(meta)
Chem.rdMolAlign.AlignMol(prbMol=ortho,
refMol=meta,
atomMap=[(0,0), (1,1)],
weights=[1,1],
)
show_mols(ortho, meta)

These two cresol rings are perpendicular. The merger produces the same compound, methylcatecol
but one has a more distorted conformation.

Monster([ortho, meta]).merge().positioned_mol # decent
Monster([meta, ortho]).merge().positioned_mol # indecent

What are the implications for this?
75 changes: 75 additions & 0 deletions documentation/changelog_0.6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
## Monster entrypoints

Previously merging in `Monster` was overly complicated and an add-on, now it is cleaner as the initialisation of
the object does not do a placement. To do that one has to do:

monster = Monster(hits)
monster.place(mol)
monster.positioned_mol

or `place_smiles` for a SMILES.

While to do a merger

monster = Monster(hits)
monster.combine()
monster.positioned_mol

Additionally, the attribute `modifications` was added/expanded, so that intermediate steps are stored
for potential inspection.
It is a dictionary, but since 3.6 dict is ordered and `.keep_copy(mol, label)` prevents over-writes.
This includes `scaffold` (the template) and `chimera` (the template with the atom symbols adapted),
which are no longer attributes.

Equally viable alternatives are stored in a list

monster.mol_options

`combine` still calls `simply_merge_hits`, which is used by placement too and merges by fragmentation
—unlike atom-ring absorption events. The fact that two different approaches are present is just historical.

If one wanted to merge two or more hits, independently of those in `.hits` attribute and without ring collapsing
and rectification etc.
`simply_merge_hits` is still the method to use:

monster = Monster([])
monster.simply_merge_hits([molA, molB])

The `place` method accepts the argument `merging_mode`, by default it is "permissive_none",
which calls `.no_blending(broad=True)`,
but "off" (nothing),
"full" (`.full_blending()`),
"partial" (`.partial_blending()`)
and "none" (`.no_blending()`)
are accepted.

## Names
The names "merge" and "place" were ultra-confusingly ambiguous when it comes to the other methods, especially those of "place".

* Now all the cases where hits are partially combined for placement purposes are called "blending"
The placement of the (distorted) 3D final monster compound Victor is called "plonk".

* place and position are used synonymously
* combine and merge are used synonymously


## Future
Victor however still has the merger as a side route.

## Overlapping rings

There was a bug in the code for `Monster` that meant that if two rings from different origins that were not bonded
—i.e. without side atoms that were absorbed they were not assessed for merging. This has been corrected.

Namely, what happens to rings is controlled in `expand_ring` method, which calls `_add_novel_bonding`, where "novel"
means that does not share the same "origin" (original molecule).
The latter method calls in turn `_get_novel_ringcore_pairs`, which now get ring marking atoms that are close or bonded.
Closeness is determined by ring atoms (not ring core markers) within 1.5 Å cutoff (viz. `_get_close_nonorigin_ringcores`)

The test case was the merger of a phenylacetic acid and phenylacetamide which overlap in two ring carbons and a terminal oxygen, resulting
in a phenylene-like ring (where one ring is an oxazine). See tests.

![phenylene](../images/phenylene.png)



56 changes: 56 additions & 0 deletions documentation/covalents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
## Covalent

Covalent forms of a molecule are marked within the `Chem.Mol` instances in fragmenstein with a "dummy atom".
This is element symbol `*` within RDKit and smiles, but `R` in a mol file and PDB file.

This is essential for the params file (topology file for the ligand).

Consequently, hits are best extracted with `Victor.extract_mols` or `Victor.extract_mol`.
Fragalysis does not do this, so the hits from there will be treated as regular compounds.

## Monster

A molecule with a single atom is passed to `attachment` argument of `Monster`,
then the covalent linker if absent in the hits is anchored to that atom.
If you know a ligand that is covalent `Victor.find_attachment` can be used to extract the attachment atom.

## Warheads

Victor has a cysteine reactive warheads operations
In the class attribute `.warhead_definitions` are stored conversions and atom names
(cf. MProVictor for an example).

**acrylamide** `C(=O)C=C` => `C(=O)CC*`
**chloroacetamide** `C(=O)C[Cl]` => `C(=O)C*`
**nitrile** `C(#N)` => `C(=N)*`
**vinylsulfonamide** `S(=O)(=O)C=C` => `S(=O)(=O)CC*`
**bromoalkyne** `C#C[Br]` => `C(=C)*`

Additionally, `.possible_definitions` contains two (or more) that may be added experimentally (don't currently work).

**aurothiol** `S[Au]P(CC)(CC)CC` => `S[Au]*`
**aldehyde** `[C:H1]=O` => `C(O)*`

There is a quick way to get a warhead definition too

Victor.get_warhead_definition(warhead_name)

In terms of Rosetta constraints (restraints), these can be added with

Victor.add_constraint_to_warhead(name=constrain_name, constraint=constraint)

For example:

* _chloroacetamide_: `AtomPair H 145A OY 1B HARMONIC 2.1 0.2`
* _nitrile_: `AtomPair H 145A NX 1B HARMONIC 2.1 0.2`
* _acrylamide_: `AtomPair H 143A OZ 1B HARMONIC 2.1 0.2`
* _vinylsulfonamide_: `AtomPair H 143A OZ1 1B HARMONIC 2.1 0.2`

Currently, only cysteine details are known to `Victor`. Cf. `.covalent_definitions`.

To convert a react_ive_ SMILES to a dummy-atom–marked react_ed_ SMILES:

Victor.make_all_warhead_combinations(smiles, warhead_name)

### Untested backdoor
The default dummy atom can be overridden with `Monster.dummy:Chem.Mol` and `Fragmenstein.dummy_symbol:str`.
4 changes: 2 additions & 2 deletions documentation/covid.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ However, there are some problems:
* the form does not have a `no inspiration hit` option, so many users submitted `x0072` the first as inspiration when submitting docked libraries.
* the inspiration hits are common to a group of submissions by a user, even if one went into one and another to another.
* some pockets have many hits, so a large amount of not fully overlapping hits are submitted
* some users submitted a mispelt hit
* some users submitted a mispelt hit code

## Custom class

Expand All @@ -28,7 +28,7 @@ For an example of the script used, see [covid.py](../covid.py).
Note that this script runs on multiple cores. For a fews smiles, which takes about 30 seconds each there is no need.
Also note that some molecules get stuck due to incorrectly entered inspirations.

For a comparision of how the three method fair with the daset see [three modes compared](three_modes_compared.md).
For a comparision of how the three method fair with the daset see [three modes compared](monster/three_modes_compared.md).

## Over-inspired problem

Expand Down
5 changes: 5 additions & 0 deletions documentation/logging_and_debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,8 @@ at each step:
for i, mod in enumerate(victor.modifications):
pymol.cmd.read_molstr(Chem.MolToMolBlock(mod, kekulize=False), f'step{i}')
pymol.cmd.save('test.pse')

The `Monster` class can trip up, and this can happen at various steps.
`victor.modifications` contains a stack of the various step, such as the collapsed rings to the rectification.


Loading

0 comments on commit 28e18c3

Please sign in to comment.