QM40 Dataset for large molecule QM property prediction #9881

elilaird · 2024-12-20T19:39:00Z

This PR adds the newly released QM40 dataset from the paper [1]. This dataset follows a similar structure to the QM9 dataset, though with a different feature order (details in docstring).

QM40 is a QMx type of dataset which includes 150K molecules optimized from B3LYP/6-31G(2df,p) level of theory in the Gaussian16 with QM parameters, optimized coordinates, Mulliken charges and Local vibrational mode parameters as a quantitative measurer of the bond strengths. These 150,000 molecules have been chosen to represent the real chemical space of drug-like compounds. The molecules have a maximum heavy atom count of up to 40 and can contain the following atoms: Carbon (C), Fluorine (F), Oxygen (O), Nitrogen (N), Sulfur (S), and Chlorine (Cl).

[1] Madushanka, A., Moura, R.T. & Kraka, E. QM40, Realistic Quantum Mechanical Dataset for Machine Learning in Molecular Science. Sci Data 11, 1376 (2024). https://doi.org/10.1038/s41597-024-04206-y

for more information, see https://pre-commit.ci

Eli J Laird and others added 5 commits October 4, 2024 21:30

QM40 Dataset

33d71cb

updated process function

72baf23

updated unit conversions for y

9f45e12

updated QM40 processing script and docstring

2a6ac8f

merged changes

cc6c781

elilaird requested a review from wsad1 as a code owner December 20, 2024 19:39

Eli J Laird and others added 3 commits December 20, 2024 13:41

updated changelog

6c53c59

[pre-commit.ci] auto fixes from pre-commit.com hooks

ee6f352

for more information, see https://pre-commit.ci

Merge branch 'master' into qm40-dataset

08a7680

akihironitta added feature dataset labels Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QM40 Dataset for large molecule QM property prediction #9881

QM40 Dataset for large molecule QM property prediction #9881

elilaird commented Dec 20, 2024

QM40 Dataset for large molecule QM property prediction #9881

Are you sure you want to change the base?

QM40 Dataset for large molecule QM property prediction #9881

Conversation

elilaird commented Dec 20, 2024