mini-CDDD (Continuous and Data-Driven Descriptors)

This repository is an unofficial mini version of the original work presented in the paper:

"Learning Continuous and Data-Driven Molecular Descriptors by Translating Equivalent Chemical Representations" by Robin Winter, Floriane Montanari, Frank Noe, and Djork-Arne Clevert.

The aim of this mini version is to recreate the core architecture from the original paper with concise code, serving as a lightweight reference. Please refer to the official repository for the complete and original implementation.

Installing

To set up the necessary environment, please install the dependencies from requirements.txt. Use the following command:

pip install -r requirements.txt

Although the training was performed under TensorFlow 2.11, we did not utilize features exclusive to this latest version. Feel free to adjust requirements.txt to fit your specific needs.

Datasets

While the original paper and repository utilized the ZINC12 and PubChem datasets, this repo is designed to implement a minimal version of the code. As such, we use the commonly available ZINC250 dataset. However, you can customize with your preferred training dataset.

Quick Start

The main.ipynb notebook implements the model version from the paper that's reported to have the best performance. We've retained only the most critical elements to ensure the code structure is easily comprehensible. The implementation is based both on the content of the original paper and a comparison with the original codebase. For instance, while the paper mentions predicting $9$ molecular descriptors using the classifier, we noticed the original codebase only uses $7$; our implementation follows the latter. Additionally, this implementation does not yet incorporate the approach from the paper where tokens of different lengths are placed in separate buckets. At the end of the code, we've also saved models, such as the Encoder + Classifier and Seq2Seq.

Todo

Implement beam search for Seq2Seq

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
250k_rndm_zinc_drugs_clean_3.csv		250k_rndm_zinc_drugs_clean_3.csv
LICENSE		LICENSE
README.md		README.md
lookup_table.json		lookup_table.json
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mini-CDDD (Continuous and Data-Driven Descriptors)

Installing

Datasets

Quick Start

Todo

About

Releases

Packages

Languages

License

lianghsun/miniCDDD

Folders and files

Latest commit

History

Repository files navigation

mini-CDDD (Continuous and Data-Driven Descriptors)

Installing

Datasets

Quick Start

Todo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages