BenGen is a fully reproducible, automatic and scalable benchmarking prototype, which provides consistently annotated and community-sharable results.
BenGen is functional for the benchmarking of multiple sequence aligners, yet can be easily adapted for the benchmarking of other bioinformatics methods.
Nextflow is the skeleton of Bengen and defines the Benchmarking workflow.
Aligner tools are stored as Docker images and available through the Docker Hub. A unique ID is assigned to each image. This guarantees the containers immutability and the full replicability of the benchmark over time.
Docker provides a container runtime for local and cloud environments. Singularity performs the same role in the context of HPC clusters.
An RDF database, based on the EDAM ontology vocabulary, contains metadata information about each component of the benchmark, making possible to automatize the benchmark and provide a consistent and machine-readable description of the incorporated data, algorithms and their results.
GitHub stores and tracks code changes in consistent manner. It also provides a friendly and
well-known user interface that would enable third parties to contribute their own tools with ease.
In order to run bengen on your machine Docker and Nextflow need to be installed.
You first need to clone the Bengen repository:
git clone https://github.com/cbcrg/bengen
Then move in the bengen directory and use make to create all the needed images:
cd bengen && make
Now you are ready to use Bengen!
In order to run BenGen on your machine in its automatic mode, after having followed the steps under the Getting started section, you can trigger the computation locally using the following command.
nextflow run query.nf
Tip:
You can use the -resume
command to cache what was already computed. This could happen if you run BenGen multiple times.
nextflow run query.nf -resume
In this way, the Metadata dataset is queried and the datasets, methods and scoring functions are automatically selected and run. The selection depends on the query.rq sparql file: this selects only the eligible combinations which can be run. Eventually the results are stored in the scores.ttl file in the proper RDF format.
In order to run BenGen manually, and so define the datasets, scoring functions and methods to be run, the bengen.nf script must be used.
nextflow run bengen.nf
Tip:
You can use the -resume
command to cache what was already computed. This could happen if you run BenGen multiple times.
nextflow run bengen.nf -resume
If you wish to test BenGen on a restricted amount of data in order to speed things up and quickly getting an overview on how it works you can use the following command:
nextflow run bengen.nf --scores DEMO/scores_demo.txt --methods DEMO/methods_demo.txt --dataset_folder "benchmarking_datasets_demo"
The overall benchmark is driven by a configuration file that allows the definition of different components
params.dataset
: Defines which dataset to use. Right now only the datasets provided in thebenchmark_dataset
directory are allowed. If you want to use them all you can use:params.dataset="*"
.params.renderer
: Choose which renderer to use among the ones provided (csv, html, json).params.out
: choose how the outputfile should be named.
Example of configuration file content:
docker.enabled = true
params.dataset = "balibase-v3.01"
params.renderer = "csv"
params.out = "output.${params.renderer}"
Important
Inside of the bengen directory you can find the methods.txt
file and the scores.txt
file.
They define which aligner to use and which score function to use.
You can modify them by adding/removing lines with the name of the aligners/scores you want to run (eg. bengen/NameOfAlignerOrScore).
Example of methods.txt
:
bengen/mafft
bengen/tcoffee
bengen/clustalo
Example of scores.txt
:
bengen/qscore
bengen/baliscore
! You can see which aligners/scores are already integrated in the project by looking respectively in the boxes or boxes_score directories. You can find these in the bengen directory.
You can easily integrate your new MSA in Bengen by using a script that automatically does the work for you.
In the bengen directory that you cloned you can find the add.sh
script.
ARGUMENTS:
- -n Name of your MSA compulsory
- -m Complete Path to your metadata file compulsory
- -t Complete Path to your template file compulsory
Example:
bash add.sh --n MSA-NAME -m /complete/path/to/your/metadatafile -t /complete/path/to/your/templatefile
You can find more inforamation on how to properly create the metadata and template files under the documentation
If you wish to contribute to the project you can integrate your new MSA in the public project.
You need to follow these steps :
- Clone the repository and modify it by adding your new MSa
- Do a pull request to merge the project
- Upload the docker images on dockerhub
Afterwards the maintainer of the project will recieve a notification and accept it if relevant to the project. Then the maintainer triggers the computation and the new results are shown on a public HTML page.