BenchmarkDR

BenchmarkDR is a modular end-to-end pipeline to train and evaluate machine learning prediction models on genomic short-read sequencing data. The predicted phenotype can be either a binary classification (e.g. susceptible to a drug/ resistant) or a regression (e.g. minimum inhibitory concentration (MIC)).

Installation

Clone this repo and set up an environment containing Snakemake

git clone https://github.com/WGS-TB/BenchmarkDR.git
conda activate base
mamba create -c conda-forge -c bioconda -n snakemake snakemake

Usage

To run the pipeline on your own data, you have to structure your input data (bacterial read and labels) in following way:

your_bacterium_1  
|  
|---- data (place paired-end short reads here)   
|   
|---- reference   
|         |                          
|         |                          
|        reference_for_your_bacterium_1.fasta   
|    
|---- AllLabels.csv    


To adapt configuration of the pipeline, there are different config files. It is necessary to adapt the config in the main folder:

|-- workflow    
|      |--rules    
|      |    |-- prediction    
|      |             |-- models_config     
|      |             |-- ...
|      |-- ...      
|    
|-- config          
|-- Snakefile

PATH_DATA: "path to folder containing your bacterial folders"

BACTERIA: ["your_bacterium_1"]

OUTPUT_DIR:  "path to folder where you would like to save all files created by the pipeline"

REPRESENTATION: ["gene_presence_absence", "snps"] # "kmer"

MODE: "MIC" # "Classification"

DRUGS: ["drug_1", ...] Column names of in your AllLabels.csv file

METHODS: ["sklearn_LinR", "sklearn_LinR_l1", "sklearn_LinR_l2", "sklearn_LinR_elasticnet", "sklearn_SVMR", "sklearn_GBTR", "sklearn_RFR"] #Classification ["sklearn_LR_l1", "sklearn_LR_l2", "sklearn_LR_elasticnet", "sklearn_SGD_l1", "sklearn_SGD_l2", "sklearn_SGD_elasticnet", "sklearn_DT", "sklearn_RFC", "sklearn_ET", "sklearn_ADB", "sklearn_GBTC", "sklearn_GNB", "sklearn_CNB", "sklearn_SVM_l1", "sklearn_SVM_l2", "sklearn_KNN", "INGOT"]

OPTIMIZATION: "None" # 'GridSearchCV', 'RandomizedSearchCV' and "None"

snakemake -s Snakefile --use-conda

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.snakemake		.snakemake
workflow		workflow
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BenchmarkDR

Installation

Usage

About

Releases

Packages

Contributors 2

Languages

WGS-TB/BenchmarkDR

Folders and files

Latest commit

History

Repository files navigation

BenchmarkDR

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages