enzymemap

Python package to atom-map, correct and suggest enzymatic reactions

Cite us

If you use EnzymeMap, please cite our publication "EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions" by E. Heid, D. Probst, W. H. Green and G. K. H. Madsen.

News

August 2023: VERSION 2: A new version of EnzymeMap is released including important bugfixes for isomerase reactions and some reactions containing protons, as well as the addition of protein information. This makes the raw and processed files rather large. For your application, if no protein information is required, you should delete the respective columns and then drop duplicates.

Database

To simply use the EnzymeMap database, use data/processed_reactions.csv.gz (corresponds to the newest version, currently v2) or download EnzymeMap from Zenodo:

Within python (with a valid enzymemap installation) you can also run enzymemap.get_data() (corresponds to the newest version, currently v2).

Installation

Download enzymemap from Github:

git clone https://github.com/hesther/enzymemap.git
cd enzymemap

Set up a conda environment (or install the packages in environment.yml in any other way convenient to you):

conda env create -f environment.yml
conda activate enzymemap

Install the enzymemap package:

pip install -e .

Reproduce our study: Recreate EnzymeMap

Extract BRENDA in the data folder (run tar -xzvf brenda_2023_1.txt.tar.gz in the data folder).

Go to the scripts folder and run

python make_raw.py

to produce data/raw_reactions.csv, data/compound_to_smiles.json and ec_nums.csv. This step processes BRENDA entries and resolves all trivial names to SMILES. You might need to download a new opsin.jar from the internet that is suitable for your system. We also provide the three processed files, so you can continue with the following steps without running make_inital.py

Then, for each EC number run process.py, for example to process EC number 1.1.3.2:

python process.py 1.1.3.2

This produces data/processed_reactions_1.1.3.2.csv. Run this for all EC numbers (it is best to parallelize this over many cores). You can also run this the individual calculations on different machines. Once all calculations are done, run

python concatenate.py

to make one dataframe containing all EC numbers. You now have recreated EnzymeMap.

Reproduce our study: Train and evaluate machine learning models

Run the scripts analysis_preprocess.py (process data), analysis_temprel.py (train template relevance model, use conda environment from templatecorr), analysis_chemprop.py(train CGR-chemprop model, use conda environment from chemprop) and analysis_plot (plot results).

Reproduce our study: Additional benchmarks

Follow the instructions in the additional_benchmarks folder to process KEGG and MetaCyc.

Copyright

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.6.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github		.github
additional_benchmarks		additional_benchmarks
data		data
devtools		devtools
docs		docs
enzymemap		enzymemap
scripts		scripts
.codecov.yml		.codecov.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.lgtm.yml		.lgtm.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
readthedocs.yml		readthedocs.yml
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

enzymemap

Cite us

News

Database

Installation

Reproduce our study: Recreate EnzymeMap

Reproduce our study: Train and evaluate machine learning models

Reproduce our study: Additional benchmarks

Copyright

Acknowledgements

About

Releases

Packages

Languages

License

hesther/enzymemap

Folders and files

Latest commit

History

Repository files navigation

enzymemap

Cite us

News

Database

Installation

Reproduce our study: Recreate EnzymeMap

Reproduce our study: Train and evaluate machine learning models

Reproduce our study: Additional benchmarks

Copyright

Acknowledgements

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages