This repository is an unofficial mini version of the original work presented in the paper:
"Learning Continuous and Data-Driven Molecular Descriptors by Translating Equivalent Chemical Representations" by Robin Winter, Floriane Montanari, Frank Noe, and Djork-Arne Clevert.
The aim of this mini version is to recreate the core architecture from the original paper with concise code, serving as a lightweight reference. Please refer to the official repository for the complete and original implementation.
To set up the necessary environment, please install the dependencies from requirements.txt. Use the following command:
pip install -r requirements.txt
Although the training was performed under TensorFlow 2.11, we did not utilize features exclusive to this latest version. Feel free to adjust requirements.txt to fit your specific needs.
While the original paper and repository utilized the ZINC12 and PubChem datasets, this repo is designed to implement a minimal version of the code. As such, we use the commonly available ZINC250 dataset. However, you can customize with your preferred training dataset.
The main.ipynb
notebook implements the model version from the paper that's reported to have the best performance. We've retained only the most critical elements to ensure the code structure is easily comprehensible. The implementation is based both on the content of the original paper and a comparison with the original codebase. For instance, while the paper mentions predicting
- Implement beam search for Seq2Seq