Skip to content

Commit

Permalink
Merge pull request #8 from milicazmarkovic/feat/docker-compose-up-run…
Browse files Browse the repository at this point in the history
…-correct

Added docker support for running the script
  • Loading branch information
hesther authored Aug 9, 2024
2 parents 7095b4b + c882f26 commit e3d438e
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 0 deletions.
15 changes: 15 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
FROM mambaorg/micromamba:1.4.7

USER root
# Keep the base environment activated
ARG MAMBA_DOCKERFILE_ACTIVATE=1
RUN apt update && apt -y install git gcc g++ make

# Use micromamba to resolve conda-forge, much faster than conda
RUN micromamba install -y python=3.8.17 pip=23.2.1 rdkit=2020.09.5 -c conda-forge
RUN micromamba install -y numpy pandas joblib tqdm -c conda-forge
RUN micromamba install -y rdchiral_cpp=1.1.2 -c conda-forge

WORKDIR /app

COPY . /app
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,21 @@ python correct.py --path data/uspto_50k --reaction_column rxn_smiles --name temp

where `--reaction_column rxn_smiles` specifies the name of the column containing reaction SMILES, `--name template` sets the name of the column for the extracted templates in the output file (here to "template"), `--nproc 20` parallelizes the program over 20 processes, `--drop_extra_cols` causes additional helper columns during extraction (canonical reactant SMILES, templates at radius 0 and 1) to be dropped before saving the dataframe to file, and `--data_format csv` specifies the input format of the data, as well as the output format.

### Extract and correct templates with Docker

If facing issues with dependencies or incompatibilities of architecture with packages (e.g. rdchiral_cpp installation can pose issues on mac M1), you can run correct.py in a docker container, using the following steps:
1. Build docker image from Dockerfile:
```
docker build -t templatecorr .
```
2. Run container using docker compose:
```
docker compose up
```

docker-compose.yaml contains default setup for running correct.py script on test data. If you want to insert your own data, simply change the path argument in command. Note that container volume is mounted to output result csv to /data folder, just like the original script.


### Use to retrain a template relevance model

If you want to use the template correction code together with the [template-relevance](https://gitlab.com/mefortunato/template-relevance) GitLab repository, there is a simple drop-in replacement: In your workflow, instead of using bin/process.py from the template-relevance repository, use temprel_scripts/process.py (same usage, same arguments). NEW: Optional additional parameters to specify the radius and presence of special groups (default `--radius 1` with special groups. To not use special groups in the templates, use `--no_special_groups`).
Expand Down
7 changes: 7 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
services:
correct-templates:
image: templatecorr:latest
command: [
"python", "correct.py", "--path", "data/uspto_50k"]
volumes:
- ./data:/app/data

0 comments on commit e3d438e

Please sign in to comment.