JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design
This repository contains code for the paper: JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design.
Originally by: AkshatKumar Nigam, Robert Pollice, Alán Aspuru-Guzik
Updated by: Gary Tom
Use Python 3.7 or up.
You will need to separately install RDKit version >= 2020.03.1. The easiest is to do this on conda.
JANUS uses SELFIES version 1.0.3. If you want to use a different version, pip install your desired version; this package will still be compatible. Note that you will have to change your input alphabets to work with other versions of SELFIES.
- Support the use of any version of SELFIES (please check your installation).
- Improved multiprocessing. Fitness function is not parallelized, in the case that the function already spawns multiple processes.
- GPU acceleration of neural networks.
- Early stopping for classifier.
- Included SMILES filtering option.
- Additional hyperparameters for controlling JANUS. Defaults used in paper are given in
tests
directory.
Install JANUS using
pip install janus-ga
Example script of how to use JANUS is found in tests/example.py:
from janus import JANUS, utils
from rdkit import Chem, RDLogger
from rdkit.Chem import AllChem, RDConfig, Descriptors
RDLogger.DisableLog("rdApp.*")
import selfies
def fitness_function(smi: str) -> float:
""" User-defined function that takes in individual smiles
and outputs a fitness value.
"""
# logP fitness
return Descriptors.MolLogP(Chem.MolFromSmiles(smi))
def custom_filter(smi: str):
""" Function that takes in a smile and returns a boolean.
True indicates the smiles PASSES the filter.
"""
# smiles length filter
if len(smi) > 81 or len(smi) == 0:
return False
else:
return True
# all parameters to be set, below are defaults
params_dict = {
# Number of iterations that JANUS runs for
"generations": 200,
# The number of molecules for which fitness calculations are done,
# exploration and exploitation each have their own population
"generation_size": 5000,
# Number of molecules that are exchanged between the exploration and exploitation
"num_exchanges": 5,
# Callable filtering function (None defaults to no filtering)
"custom_filter": custom_filter,
# Fragments from starting population used to extend alphabet for mutations
"use_fragments": True,
# An option to use a classifier as selection bias
"use_classifier": True,
}
# Set your SELFIES constraints (below used for manuscript)
default_constraints = selfies.get_semantic_constraints()
new_constraints = default_constraints
new_constraints['S'] = 2
new_constraints['P'] = 3
selfies.set_semantic_constraints(new_constraints) # update constraints
# Create JANUS object.
agent = JANUS(
work_dir = 'RESULTS', # where the results are saved
fitness_function = fitness_function, # user-defined fitness for given smiles
start_population = "./DATA/sample_start_smiles.txt", # file with starting smiles population
**params_dict
)
# Alternatively, you can get hyperparameters from a yaml file
# Descriptions for all parameters are found in default_params.yml
params_dict = utils.from_yaml(
work_dir = 'RESULTS',
fitness_function = fitness_function,
start_population = "./DATA/sample_start_smiles.txt",
yaml_file = 'default_params.yml', # default yaml file with parameters
**params_dict # overwrite yaml parameters with dictionary
)
agent = JANUS(**params_dict)
# Run according to parameters
agent.run() # RUN IT!
Within this file are examples for:
- A function for calculting property values (see function
fitness_function
). - Custom filtering of SMILES (see function
custom_filter
). - Initializing JANUS from dictionary of parameters.
- Generating hyperparameters from provided yaml file (see function
janus.utils.from_yaml
).
You can run the file with provided test files
cd tests
python ./example.py
Important parameters the user should provide:
work_dir
: directory for outputting resultsfitness_function
: fitness function defined for an input smiles that will be maximizedstart_population
: path to text file of starting smiles one each new linegenerations
: number if evolution iterations to performgeneration_size
: number of molecules in the populations per generationcustom_filter
: filter function checked after mutation and crossover, returnsTrue
for accepted moleculesuse_fragments
: toggle adding fragments from starting population to mutation alphabetuse_classifier
: toggle using classifier for selection bias
See tests/default_params.yml for detailed description of adjustable parameters.
All results from running JANUS will be stored in specified work_dir
.
The following files will be created:
- fitness_explore.txt: Fitness values for all molecules from the exploration component of JANUS.
- fitness_local_search.txt: Fitness values for all molecules from the exploitation component of JANUS.
- generation_all_best.txt: Smiles and fitness value for the best molecule encountered in every generation (iteration).
- init_mols.txt: List of molecules used to initialte JANUS.
- population_explore.txt: SMILES for all molecules from the exploration component of JANUS.
- population_local_search.txt: SMILES for all molecules from the exploitation component of JANUS.
- hparams.json: Hyperparameters used for initializing JANUS.
Our code and results for each experiment in the paper can be found here:
- Experiment 4.1: https://drive.google.com/file/d/1rscIyzpTvtyiEkoP1WsF-XtSHJGQStUU/view?usp=sharing
- Experiment 4.3: https://drive.google.com/file/d/1tlIdfSWwzVeJ5kZ98l8G6osE9zf9wP1f/view?usp=sharing
- GuacaMol: https://drive.google.com/file/d/1FqetwNg6VVc-C3eiPoosGZ4-47WpYBAt/view?usp=sharing
Make a github issue 😄. Please be as clear and descriptive as possible. Please feel free to reach out in person: (akshat[DOT]nigam[AT]mail[DOT]utoronto[DOT]ca, rob[DOT]pollice[AT]utoronto[DOT]ca)