JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design

This repository contains code for the paper: JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design.

Originally by: AkshatKumar Nigam, Robert Pollice, Alán Aspuru-Guzik

Updated by: Gary Tom

Prerequsites:

Use Python 3.7 or up.

You will need to separately install RDKit version >= 2020.03.1. The easiest is to do this on conda.

JANUS uses SELFIES version 1.0.3. If you want to use a different version, pip install your desired version; this package will still be compatible. Note that you will have to change your input alphabets to work with other versions of SELFIES.

Major changes:

Support the use of any version of SELFIES (please check your installation).
Improved multiprocessing. Fitness function is not parallelized, in the case that the function already spawns multiple processes.
GPU acceleration of neural networks.
Early stopping for classifier.
Included SMILES filtering option.
Additional hyperparameters for controlling JANUS. Defaults used in paper are given in tests directory.

How to run:

Install JANUS using

pip install janus-ga

Example script of how to use JANUS is found in tests/example.py:

from janus import JANUS, utils
from rdkit import Chem, RDLogger
from rdkit.Chem import AllChem, RDConfig, Descriptors
RDLogger.DisableLog("rdApp.*")

import selfies

def fitness_function(smi: str) -> float:
    """ User-defined function that takes in individual smiles 
    and outputs a fitness value.
    """
    # logP fitness
    return Descriptors.MolLogP(Chem.MolFromSmiles(smi))

def custom_filter(smi: str):
    """ Function that takes in a smile and returns a boolean.
    True indicates the smiles PASSES the filter.
    """
    # smiles length filter
    if len(smi) > 81 or len(smi) == 0:
        return False
    else:
        return True

# all parameters to be set, below are defaults
params_dict = {
    # Number of iterations that JANUS runs for
    "generations": 200,

    # The number of molecules for which fitness calculations are done, 
    # exploration and exploitation each have their own population
    "generation_size": 5000,
    
    # Number of molecules that are exchanged between the exploration and exploitation
    "num_exchanges": 5,

    # Callable filtering function (None defaults to no filtering)
    "custom_filter": custom_filter,

    # Fragments from starting population used to extend alphabet for mutations
    "use_fragments": True,

    # An option to use a classifier as selection bias
    "use_classifier": True,
}

# Set your SELFIES constraints (below used for manuscript)
default_constraints = selfies.get_semantic_constraints()
new_constraints = default_constraints
new_constraints['S'] = 2
new_constraints['P'] = 3
selfies.set_semantic_constraints(new_constraints)  # update constraints

# Create JANUS object.
agent = JANUS(
    work_dir = 'RESULTS',                                   # where the results are saved
    fitness_function = fitness_function,                    # user-defined fitness for given smiles
    start_population = "./DATA/sample_start_smiles.txt",   # file with starting smiles population
    **params_dict
)

# Alternatively, you can get hyperparameters from a yaml file
# Descriptions for all parameters are found in default_params.yml
params_dict = utils.from_yaml(
    work_dir = 'RESULTS',  
    fitness_function = fitness_function, 
    start_population = "./DATA/sample_start_smiles.txt",
    yaml_file = 'default_params.yml',       # default yaml file with parameters
    **params_dict                           # overwrite yaml parameters with dictionary
)
agent = JANUS(**params_dict)

# Run according to parameters
agent.run()     # RUN IT!

Within this file are examples for:

A function for calculting property values (see function fitness_function).
Custom filtering of SMILES (see function custom_filter).
Initializing JANUS from dictionary of parameters.
Generating hyperparameters from provided yaml file (see function janus.utils.from_yaml).

You can run the file with provided test files

cd tests
python ./example.py

Important parameters the user should provide:

work_dir: directory for outputting results
fitness_function: fitness function defined for an input smiles that will be maximized
start_population: path to text file of starting smiles one each new line
generations: number if evolution iterations to perform
generation_size: number of molecules in the populations per generation
custom_filter: filter function checked after mutation and crossover, returns True for accepted molecules
use_fragments: toggle adding fragments from starting population to mutation alphabet
use_classifier: toggle using classifier for selection bias

See tests/default_params.yml for detailed description of adjustable parameters.

Outputs:

All results from running JANUS will be stored in specified work_dir.

The following files will be created:

fitness_explore.txt: Fitness values for all molecules from the exploration component of JANUS.
fitness_local_search.txt: Fitness values for all molecules from the exploitation component of JANUS.
generation_all_best.txt: Smiles and fitness value for the best molecule encountered in every generation (iteration).
init_mols.txt: List of molecules used to initialte JANUS.
population_explore.txt: SMILES for all molecules from the exploration component of JANUS.
population_local_search.txt: SMILES for all molecules from the exploitation component of JANUS.
hparams.json: Hyperparameters used for initializing JANUS.

Paper Results/Reproducibility:

Our code and results for each experiment in the paper can be found here:

Experiment 4.1: https://drive.google.com/file/d/1rscIyzpTvtyiEkoP1WsF-XtSHJGQStUU/view?usp=sharing
Experiment 4.3: https://drive.google.com/file/d/1tlIdfSWwzVeJ5kZ98l8G6osE9zf9wP1f/view?usp=sharing
GuacaMol: https://drive.google.com/file/d/1FqetwNg6VVc-C3eiPoosGZ4-47WpYBAt/view?usp=sharing

Questions, problems?

Make a github issue 😄. Please be as clear and descriptive as possible. Please feel free to reach out in person: (akshat[DOT]nigam[AT]mail[DOT]utoronto[DOT]ca, rob[DOT]pollice[AT]utoronto[DOT]ca)

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
aux_files		aux_files
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design

Prerequsites:

Major changes:

How to run:

Outputs:

Paper Results/Reproducibility:

Questions, problems?

License

About

Releases

Packages

Languages

License

chertianser/JANUS

Folders and files

Latest commit

History

Repository files navigation

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design

Prerequsites:

Major changes:

How to run:

Outputs:

Paper Results/Reproducibility:

Questions, problems?

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages