Skip to content

Unofficial mini implementation of 'Learning Continuous and Data-Driven Molecular Descriptors' paper. A concise codebase replicating the core architecture as a lightweight reference.

License

Notifications You must be signed in to change notification settings

lianghsun/miniCDDD

Repository files navigation

mini-CDDD (Continuous and Data-Driven Descriptors)

This repository is an unofficial mini version of the original work presented in the paper:

"Learning Continuous and Data-Driven Molecular Descriptors by Translating Equivalent Chemical Representations" by Robin Winter, Floriane Montanari, Frank Noe, and Djork-Arne Clevert.

The aim of this mini version is to recreate the core architecture from the original paper with concise code, serving as a lightweight reference. Please refer to the official repository for the complete and original implementation.

Installing

To set up the necessary environment, please install the dependencies from requirements.txt. Use the following command:

pip install -r requirements.txt

Although the training was performed under TensorFlow 2.11, we did not utilize features exclusive to this latest version. Feel free to adjust requirements.txt to fit your specific needs.

Datasets

While the original paper and repository utilized the ZINC12 and PubChem datasets, this repo is designed to implement a minimal version of the code. As such, we use the commonly available ZINC250 dataset. However, you can customize with your preferred training dataset.

Quick Start

The main.ipynb notebook implements the model version from the paper that's reported to have the best performance. We've retained only the most critical elements to ensure the code structure is easily comprehensible. The implementation is based both on the content of the original paper and a comparison with the original codebase. For instance, while the paper mentions predicting $9$ molecular descriptors using the classifier, we noticed the original codebase only uses $7$; our implementation follows the latter. Additionally, this implementation does not yet incorporate the approach from the paper where tokens of different lengths are placed in separate buckets. At the end of the code, we've also saved models, such as the Encoder + Classifier and Seq2Seq.

Todo

  • Implement beam search for Seq2Seq

About

Unofficial mini implementation of 'Learning Continuous and Data-Driven Molecular Descriptors' paper. A concise codebase replicating the core architecture as a lightweight reference.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published