Skip to content

Creating a New Substituent Type

ajs99778 edited this page Aug 24, 2019 · 21 revisions

Table of Contents

Overview

AaronTools is packaged with a number of substituents in the built-in Substituent Library. Additional substituents may be easily added to your Personal Aaron Library. If you require more flexibility, substituents can also be built up from ones that are already in the library.

Adding Substituents to $HOME/AARON_libs/Subs

The AaronTools utility libadd_substituent can facilitate selection of a substituent from a provided XYZ file and add it to the personal substituent library ($HOME/Aaron_libs/Subs).

In the following sections, the target atom will refer to the atom on the substituent side of the bond connecting the substituent to the rest of the molecule. The avoid atom will refer to the atom on the molecule side of this bond.

To add a new Substituent:

  1. Determine the target and avoid atom indices in your XYZ file.
  2. Determine the number of conformers you want Aaron to consider and the rotation angle between each conformer.
  3. Run
 libadd_substituent file.xyz -t <target> -a <avoid> -c <num-confs> <rot-angle> -n <name>

If the number of conformers or the rotation angle are not supplied, one conformer with zero degree rotation will be assumed. See the Format for Substituent XYZ Files section below if a change to the number of conformers or rotation angle is desired.

For example, to extract the phenyl group from the structure above, we would first identity the target atom as 31 and the avoid atom as 27. Due to the symmetry for the phenyl group, we will only request two conformers, 90 degrees apart. This would be executed via the command line as:

 libadd_substituent example.xyz -t 31 -a 27 -c 2 90 -n phenyl

Now, our personal library contains $HOME/Aaron_libs/Subs/phenyl.xyz, which can be used for other tasks as you would the built-in substituents from the Substituent Library.

Substituent Naming

Note that in this example, the phenyl group is already a built-in substituent. Aaron will first search for named substituents in your personal library before searching in the built-in library:

 $HOME/Aaron_libs/Subs
 $QCHASM/AaronTools/Subs

Thus, substituents built using this method will be used instead of built-in ones of the same name.

Format for Substituent XYZ Files

Substituent files follow the format of a standard XYZ file, but with the additional conformer information in the comment line. The comment line should be formatted as CF:NumConfs,RotAngle, as seen in this phenyl example:

11
CF:2,90
C      1.408780      0.000000      0.000000
C      2.192902     -0.484731      1.059260
C      2.039680      0.608587     -1.100436
C      3.584822     -0.381150      0.994992
H      1.713119     -0.931768      1.921314
C      3.429329      0.718638     -1.148310
H      1.429491      0.990910     -1.916441
C      4.212647      0.216034     -0.103289
H      4.181388     -0.761232      1.821954
H      3.899334      1.192009     -2.007794
H      5.296754      0.294017     -0.141870

Building up from Other Substituents

Simple Syntax

Substituents in the library can also be chained together to produce a new substituent. Unlike libadd_substituent, these substituents are not saved anywhere. They are created every time you need them in Aaron or AaronTools. As an example, you could use substitute to attach 3,4-dimethoxybenzyl to a methane molecule:

 substitute methane.xyz -s 1=1-{34-OMe-Ph}Me -o out.xyz

AaronTools will attach the OMe substituent to the 3 and 4 positions of the Ph substituent. It will then attach the entire thing to the Me substituent. Finally, substitute will attach this to methane.xyz.

The positions on the substituent are determined by how many bonds a particular atom is from where the substituent attaches to the molecule. Hydrogen atoms and non-hydrogen atoms without a bond to hydrogen are not counted, with the exception of the atom attached to the molecule (e.g. atom 1 on tBu is a quaternary carbon, but atom 2 on Bn is the ortho carbon and not the ipso carbon of the ring). In the case of branching, the path with the shortest bond at the point of branching is considered first, followed by the next chain. While this may not produce the proper enumeration of atoms, it does assign each modifiable position a unique number. Example atom numberings for different substituent foundations are shown below.

Et:
Ph:
Mes:
COOH:
When AaronTools builds a substituent from others in the library, AaronTools will keep track of which bonds in the substituent are rotatable and how many conformers should be considered for each building block. Built substituents can be useful for screening more flexible substituents. Currently, AaronTools cannot process information regarding multiple flexible bonds from a substituent's coordinate file, so built substituents are not saved to a file in the substituent library.

The number of rotamers depends on the symmetry of the new substituent compared to the symmetry of the "foundation" of the substituent. Going back to the 1-{34-OMe-Ph}Me substituent, the foundation of the 34-OMe-Ph part of this substituent would be Ph. The Ph substituent has C2 symmetry with respect to the bond where it connects to the rest of the molecule. If a 180° rotation (determined by order of rotational symmetry) can be applied to this part of this substituent, and any substituents on the phenyl can be rotated so that this geometry is identical to the pre-rotation geometry, then AaronTools will only consider two possible rotations for the bond between the phenyl ring and whatever molecule it is attached to. The OMe in the 4 position can be rotated another 180° to line up with the OMe in the unrotated geometry. The OMe in the 3 position makes the 34-OMe-Ph unable to line up with the rotated geometry, so AaronTools will consider four rotations for this part of the substituent moving forward. Likewise, the Me substituent is C3 symmetric, but the 34-OMe-Ph breaks this symmetry. Therefore, AaronTools determines that this substituent has three rotamers. If substituents were added symmetrically to Me (e.g. 111-Et-Me) only two rotamers would be considered. Both of the OMe substituents have two conformers each, which is specified in their coordinate files. Running make_conf reveals that AaronTools will consider 48 conformers for this substituent:

make_conf out.xyz -s 1=1-{34-OMe-Ph}Me | grep -c Cf
48

Several common substituent abbreviations are recognized by AaronTools. A list is included below. The string that would generate this substituent is also given, as well as the total number of conformers for this substituent. Note - the proper number of conformers might not be scanned if you build substituents off of these (e.g. 2-Me-Bn will not have all conformers 1-{2-Me-Ph}Me would, even though the generated structure is the same). There are cases where building on already built substituents will produce all the expected conformers, but this can only happen if symmetry is maintained (e.g. 4-Me-Bn).

abbreviation name string conformers
Bn benzyl 1-Ph-Me 6
MePh2 diphenylmethyl 11-Ph-Me 12
MePh3 triphenylmethyl 111-Ph-Me 16
EtF5 pentafluoroethyl 1-CF3-11-F-Me 6
iBu iso-butyl 1-iPr-Me 9
nBu n-butyl 1-{1-Et-Me}Me 27
Pr propyl 1-Et-Me 9
Boc t-butyloxycarbonyl 2-tBu-COOH 4
CBz carboxybenzyl 1-Bn-COOH 12
MOM methoxymethyl 1-OMe-Me 6
PMB 4-methoxybenzyl 4-OMe-Bn 12
Troc 2,2,2-trichloroethyl carbonate 1-{2-{1-{111-Cl-Me}Me}COOH}OH 24
BOM benzyloxymethyl acetal 1-{1-{1-Bn-OH}Me}OH 72
TBDPS t-butyldiphenylsilyl ether 1-{1-tBu-11-Ph-SiH3}OH 48
TBS t-butyldimethylsilyl ether 1-{1-tBu-11-Me-SiH3}OH 12
TIPS triisopropylsilyl ether 1-{111-iPr-SiH3}OH 108
TES triethylsilyl ether 1-{111-Et-SiH3}OH 108
As there are many possibilities for built substituents, a few additional instructive examples are listed below.
name string
o-tolyl 2-Me-Ph
trimethyl silyl 111-Me-SiH3
4-methoxybenzyl ether 1-{1-{4-OMe-Ph}Me}OH
4-methoxybenzyl ether 1-{1-{4-OMe-Ph}-Me}-OH
2-(4-trifluoromethylphenyl)-2-(4-methylphenyl)ethyl 2-{4-CF3-Ph}-2-{4-Me-Ph}Et
the other stereoisomer of the previous substituent 2-{4-Me-Ph}-2-{4-CF3-Ph}Et
3,5-bis(trifluoromethyl)phenyl 35-CF3-Ph
3,5-bis(trifluoromethyl)phenyl 3-5-CF3-Ph

Detailed Syntax

You can also use a more explicit notation when building substituents. This may be useful when your substituent has R/S enantiomers (e.g. sec-butyl) or if you want to replace something that is not a hydrogen atom. As an example of this notation, here's how you'd use substitute to attach 3,4-dimethoxybenzyl to methane, where 3,4-dimethoxybenzyl is built from other substituents:

substitute methane.xyz -s 1="foundation=Me positions=4 decorations={{foundation=Ph positions=8,9 decorations={{OMe},{OMe}}}}"

foundation specifies the substituent to which 'decorations' are added, and positions specifies where the respective decorations should be placed. One important difference between this more explicit notation and the more natural notation described above is the position numbering. Instead of looking for H atoms on the 3rd and 4th heavy atom on Ph, the exact positions of these H's were specified The 8th and 9th atoms in our Ph.xyz are the meta and para H's, as shown below.

When using this syntax, the decorations section must be enclosed in curly braces. Each individual decoration must also be enclosed in its own set of curly braces.