DERP Simulation

This repository contains the code to simulate a database of phylogenetic trees that will be used in a machine learning project in which a neural network will be trained to solve phylodynamics problems.

Usage

Getting started generating a database

To generate the database, run the main script:

python main.py <path/to/config.json>

If you want a small example to test this out, try using the config/debugging.json file. The whole simulation is configured by the JSON file provided at the command line.

Configuring a simulation

The way in which a dataset is simulated is configured with a JSON file. There is a schema for valid configurations described here. There are some example configurations provided:

debugging
Charmander (see here)
Charmander with a contemporaneous sample
Charmeleon (see here)
Charizard (see here)

Additional information about these datasets is given below.

Visualising the data

Two scripts, visualisation.py and visualisation_temporal.py can be used to visualise the output of a simulation. These need to be modified so that Python knows the location of the config and are then run without arguments.

python visualisation.py
python visualisation_temporal.py

Note that the latter only applies for simulations which are configured to report temporal data (that is, report_temporal_data is set to true in the config).

Update this section of the documentation!!!

Using the database

The following demonstrates how to use the database in Python. Don’t forget to close the database connection after using it!

db_conn = h5py.File("dataset.hdf5")
for k in db_conn.keys():
    out_grp = db_conn[k]['output']
    r0_vals = out_grp['parameters']['r0']['values'][()]
    r0_chng = out_grp['parameters']['r0']['change_times'][()]
    prev = out_grp["present_prevalence"][()]
    print(f"Record {k} prevalence {prev}")
    tree = pickle.loads(db_conn[k]['input']['tree'][...].tobytes())
db_conn.close()

If you want a GUI to inspect the output HDF5 file, the HDFCompass tool provides a simple way to inspect the data that has been generated. There is some basic information about the simulation stored as attributes in the HDF5 file. This includes the date of creation and the size of the dataset.

Conda environment

A conda environment to run this simulation can be created from the environment.yaml file by running the following command:

conda env create -f environment.yaml

This environment will have all the correct packages for running the simulations.

Installing BEAST2

BEAST2 is used to simulate the data. If you don’t have BEAST2 installed, there is a script scr/setuplib.sh which will download and install this for you. Once you have BEAST2 installed, you will need to install remaster through BEAUti.

Datasets

There are a sequence of configurations: Charmander, Charmeleon and Charizard. These all use the same model but are of increasing size and use broader distributions over the simulation parameters.

Charmander

This is intended as a toy dataset. It has a 800-100-100 training-validation-testing split. The parameters are nearly constant through time, for example, the $R_0$ values are shown in Figure fig:charmander-r0s. The configuration for this simulation is simulation-charmander.json.

Charmander contemporaneous

This is very similar to the Charmander configuration but instead of serial sampling, there is a single contemporaneous sample at the present.

Charmeleon

This is intended as a small dataset. It has a 1600-200-200 training-validation-testing split. The parameters vary significantly through time, for example, the $R_0$ values are shown in Figure fig:charmeleon-r0s. The configuration for this simulation is simulation-charmeleon.json.

Charizard

This is intended as a plausible dataset for use in training a useful neural network. It has a 8000-1000-1000 training-validation-testing split (although there are 11000 simulations attempted to adjust for failures). The parameters vary significantly through time, for example, the $R_0$ values are shown in Figure fig:charizard-r0s. The configuration for this simulation is simulation-charizard.json.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
config		config
out		out
src		src
LICENSE		LICENSE
environment.yaml		environment.yaml
main.py		main.py
readme.org		readme.org
visualisation.org		visualisation.org
visualisation.py		visualisation.py
visualisation_temporal.py		visualisation_temporal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DERP Simulation

Usage

Getting started generating a database

Configuring a simulation

Visualising the data

Update this section of the documentation!!!

Using the database

Conda environment

Installing BEAST2

Datasets

Charmander

Charmander contemporaneous

Charmeleon

Charizard

About

Contributors 2

Languages

License

aezarebski/derp-simulation

Folders and files

Latest commit

History

Repository files navigation

DERP Simulation

Usage

Getting started generating a database

Configuring a simulation

Visualising the data

Update this section of the documentation!!!

Using the database

Conda environment

Installing BEAST2

Datasets

Charmander

Charmander contemporaneous

Charmeleon

Charizard

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages