Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QM40 Dataset for large molecule QM property prediction #9881

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

elilaird
Copy link

This PR adds the newly released QM40 dataset from the paper [1]. This dataset follows a similar structure to the QM9 dataset, though with a different feature order (details in docstring).

QM40 is a QMx type of dataset which includes 150K molecules optimized from B3LYP/6-31G(2df,p) level of theory in the Gaussian16 with QM parameters, optimized coordinates, Mulliken charges and Local vibrational mode parameters as a quantitative measurer of the bond strengths. These 150,000 molecules have been chosen to represent the real chemical space of drug-like compounds. The molecules have a maximum heavy atom count of up to 40 and can contain the following atoms: Carbon (C), Fluorine (F), Oxygen (O), Nitrogen (N), Sulfur (S), and Chlorine (Cl).

[1] Madushanka, A., Moura, R.T. & Kraka, E. QM40, Realistic Quantum Mechanical Dataset for Machine Learning in Molecular Science. Sci Data 11, 1376 (2024). https://doi.org/10.1038/s41597-024-04206-y

@elilaird elilaird requested a review from wsad1 as a code owner December 20, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants