-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending structures
(bonds, atom charges, etc)
#426
Comments
I think charges would be a useful property to add. Some formats like PDB also allow you to specify the connections between the atoms. I (ab)used this feature in the past for visualizing some of my course grained data, so I think this could be a useful feature as well. A little over a year ago, I talked with some persons from materials cloud about which properties they would like to see standardized.
For some of these, I do not really know what they are, so I can't really tell how useful these would be. Some other properties that I thought could be useful(mostly for use within trajectories, but some are also useful for structures too) to add are:
|
I am mostly interested in chemical connectivity. However, I would expect the definition of chemical bond and its types to be quite involving. Could we adopt some already existing convention? CML, for instance, defines integer-numbered bond types for orders 1 to 3 (no 4), aromatic, unknown and other. To this list I would add order 4 and zero-order bonds. Anything else? I saw @eimrek's addition to OPTIMADE paper manuscript about a database of covalent organic networks, thus it would be interesting to hear their opinion. Also pinging @BobHanson and @vaitkus for comments. |
I think it will be more informative to allow non-integer bond orders than just having a value of 0. Some d block metal dimers can have a bond order as high as 6, so I think we should allow the bond order to reach that value. |
I agree. V3000 allows for dative and coordinate bonds. Whether you call
these "zero order" or not, is up to you.
[
https://depth-first.com/articles/2021/11/17/ten-reasons-to-adopt-the-v3000-molfile-format/
]
But bonding in general adds significant complexity to a model. Beware!
|
I think it might be quite difficult to agree on a single bonding model that covers every situation so we could start with something simple and then extend it in the future as needed. Some general thoughts on the model:
|
I agree with Antanas. My thought on aromaticity is that -- particularly
with associated 3D structures, as we in this case -- standard Kekulé
bonding is preferable. Aromaticity is not needed, since the 3D structure is
there, and planarity, bond distances, aromaticity, and such can be easily
derived from that.
This also relates to SMILES (clearly also a bonding model). We should be
recommending non-aromatic SMILES. Explicit double bonds. That preference
comes primarily from the fact that generally these SMILES will be *targets*
-- that is, SMILES that actually represent structures. For searching
structures, one may want an aromatic bonding model for the search
*pattern *(Cc1ccccc1),
but for a *target* one always needs the Kekulé bonding. Because Cc1ccccc1
will match CC1=CC=CC=C1, but CC1=CC=CC=C1 will not (is not supposed to)
match Cc1ccccc1. From
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html:
*SMILES is interpreted as a molecule, and it is the resultant molecule (not
the SMILES string) which is subject to searching. Similarly, SMARTS is
interpreted as a pattern; it is this pattern (not the SMARTS string) which
is matched against molecules. For instance, the SMILES "C1=CC=CC=C1"
(cyclohexatriene) is interpreted as the benzene molecule. This molecule
will be matched by the SMARTS c1ccccc1, which is interpreted as the pattern
"6 aromatic carbons in a ring". The SMARTS "C1=CC=CC=C1" makes a pattern
("six aliphatic carbons in a ring with alternating single and double
bonds") which will not match benzene. *
My point is not about SMILES, though. It's about bonding. I like this
statement, that SMILES doesn't need any aromatic description to represent
benzene. Same goes for what we are talking about using V3000 or whatever
format.
Bob
Personally, I think they made a fundamental mistake in SMILES to allow
aromatic descriptions there. Really they are much more useful and relevant
in SMARTS, and because of this asymmetry of matching, are just a pain in
SMILES.
Bob
…On Mon, Feb 13, 2023 at 7:08 AM Antanas Vaitkus ***@***.***> wrote:
I think it might be quite difficult to agree on a single bonding model
that covers every situation so we could start with something simple and
then extend it in the future as needed. Some general thoughts on the model:
- It would be nice to be able to specify the bonding without
explicitly assigning the bond type/order (e.g. only provide the
connectivity graph). I guess this could be achieved by using the CML
unknown bond type or something similar.
- Maybe aromaticity should be a separate property of a bond rather
than a bond type? This might be used to convey that certain bonds are
aromatic, but described using the Kekulé notation. Furthermore, the
OpenChemLib <https://github.com/Actelion/openchemlib> library actually
differentiates between aromatic bonds that can resonate (e.g. in benzene)
and the ones that have a more or less fixed bond order (e.g. in thiophene).
Thus it is quite reasonable under some circumstances to define a bond as
both being aromatic and having a specific bond order.
—
Reply to this email directly, view it on GitHub
<#426 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEHNCW6PNWLQBRTFGKTE5JDWXIW4FANCNFSM6AAAAAARHXTWUA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Robert M. Hanson
Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
*We stand on the homelands of the Wahpekute Band of the Dakota Nation. We
honor with gratitude the people who have stewarded the land throughout the
generations and their ongoing contributions to this region. We acknowledge
the ongoing injustices that we have committed against the Dakota Nation,
and we wish to interrupt this legacy, beginning with acts of healing and
honest storytelling about this place.*
|
hi all. @merkys, our covalent organic framework databases don't contain bond orders and currently no intention to add it, as far as I'm aware. @ltalirz @yakutovicha correct me if i'm wrong. Regarding atomic charges: there are multiple methods to calculate them: e.g. mulliken, hirshfeld, bader, ESP-derived, ...(https://mattermodeling.stackexchange.com/questions/1439/what-are-the-types-of-charge-analysis). Would this be something that the database provider just decides on which charges they provide? Still, it would be good to have information about method of calculation. Regarding bond orders, there's a similar argument: there are multiple ways to calculate bond orders that can give different results. Additionally, one thing to keep in mind is how to represent non-kekule molecules, e.g. triangulene, and unpaired electrons and radical sites in general. |
Thanks all for interesting responses. I agree that choosing the right representation for bond type/order will require a lot of thought. Thus I find @vaitkus's suggestion really appealing:
Separating aromaticity from bond type/order is also a good suggestion. How about starting from this: "bonds": [ { "sites": [ 1, 2 ] } ]
I believe @eimrek's suggestion about specifying calculation methods should be promoted to more general level as other properties could benefit from such metadata as well. |
I would prefer a more succinct format. Why duplicate "site" a zillion
times? Maybe just array of arrays.
Suggest array of
[index1, index2, type]
Where type is reserved for future use and could be 0 for placeholder.
Mostly just reacting to needless byte bloat
…On Fri, Feb 17, 2023, 6:51 AM Andrius Merkys ***@***.***> wrote:
Thanks all for interesting responses. I agree that choosing the right
representation for bond type/order will require a lot of thought. Thus I
find @vaitkus <https://github.com/vaitkus>'s suggestion really appealing:
- It would be nice to be able to specify the bonding without
explicitly assigning the bond type/order (e.g. only provide the
connectivity graph). I guess this could be achieved by using the CML
unknown bond type or something similar.
Separating aromaticity from bond type/order is also a good suggestion.
How about starting from this:
"bonds": [ { "sites": [ 1, 2 ] } ]
- sites would be the single REQUIRED property giving a list of sites
participating in a bond. As @JPBergsma <https://github.com/JPBergsma>
noted, sites list could contain more than two sites.
- JSON object describing a single bond could then later be expanded by
introducing properties giving type/order, aromaticity and so on.
I believe @eimrek <https://github.com/eimrek>'s suggestion about
specifying calculation methods should be promoted to more general level as
other properties could benefit from such metadata as well.
—
Reply to this email directly, view it on GitHub
<#426 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEHNCW3FMVLNFVBZP2AJEO3WX5X6ZANCNFSM6AAAAAARHXTWUA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I understand the pros of a more succinct representation, but I tried to retain consistency with the other OPTIMADE properties which use explicit keys. Moreover, suggested plain list representation would not allow for bonds of more than two atoms. Placeholder value of 0 might be perceived as zero order bond by some. It is better to avoid placeholders at all, if no |
It might be nice if this design could also capture generic "connectivity", and serve e.g., list of sites within some cutoff of another site in PBCs. Having pre-computed neighbour lists can really help accelerate some applications and could allow for some kind of local environment/oxidation state searching expressed via correlated list queries (though this might require species data to be added to each bond, maybe not favourable), e.g., "give me all structures that contain SiO4 tetrahedra" It would then be up to the database to decide this "calculation method" still, e.g., what distance cutoff to use (constant, sum of ionic/vdw radii etc) |
Yes, sorry, I was on my phone and, ah, still in bed... Meant to follow that
with:
"That said, the more use of associative arrays, the more easily extended
this will be."
Q: What else do we have that references sites like this?
…On Fri, Feb 17, 2023 at 8:17 AM Andrius Merkys ***@***.***> wrote:
I would prefer a more succinct format. Why duplicate "site" a zillion
times? Maybe just array of arrays.
I understand the pros of a more succinct representation, but I tried to
retain consistency with the other OPTIMADE properties which use explicit
keys. Moreover, suggested plain list representation would not allow for
bonds of more than two atoms. Placeholder value of 0 might be perceived as
zero order bond by some. It is better to avoid placeholders at all, if no
"type" (or something like it) property is given in a bond object, nothing
else but some sort of connectivity should be assumed.
—
Reply to this email directly, view it on GitHub
<#426 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEHNCWZL7EME6D5CLZLATWTWX6CATANCNFSM6AAAAAARHXTWUA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Robert M. Hanson
Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
*We stand on the homelands of the Wahpekute Band of the Dakota Nation. We
honor with gratitude the people who have stewarded the land throughout the
generations and their ongoing contributions to this region. We acknowledge
the ongoing injustices that we have committed against the Dakota Nation,
and we wish to interrupt this legacy, beginning with acts of healing and
honest storytelling about this place.*
|
OPTIMADE has |
This might be slightly off-topic, but how does one get atom bonding out of QM calculations? Can existence of bonds/their types be objectively detected via QM, or would one need some heuristic (i.e., distance-based criterion) to derive them? Pinging @gmrigna. |
Here's a small overview of QM bond order methods: https://mattermodeling.stackexchange.com/questions/901/what-are-the-types-of-bond-orders/1508 Most of these (or at least the popular ones, Wiberg, Mayer and Laplacian, which I also have some experience with) are fully determined based on the electronic structure (so, the density/density matrix/occupied molecular orbitals/or derived orbitals) and the atom-atom distance is not "explicitly" used. |
Suggestion for a queryable property:
|
OPTIMADE specification v1.0.1 defines a structure as a set of sites, occupied by mixtures of atoms, with each atom described by its chemical type, mass and occupancy (proportion in the mixture). Means for expressing disorder are also in place, defined quite similarly to CIF standard.
I wonder whether there would be an interest to add more chemical attributes to OPTIMADE
structures
such as:Some of these attributes can be derived algorithmically (connectivity, lone pairs), but derivation algorithms are often based on heuristics and sometimes fail to arrive at "correct" result. Thus if these details are available at provider's side, it would be nice to have them communicated in OPTIMADE attributes.
The text was updated successfully, but these errors were encountered: