scalepopgen

Pipeline to carry out comprehensive population genomic analyses.

scalepopgen is a fully automated nextflow-based pipeline that takes vcf files or plink generated bed files as input and employ a variety of open-source tools to carry out comprehensive population genomic analyses. Additionally, python and R scripts have been developed to combine and (wherever possible) plot the results of various analyses.

Broadly, the pipeline consists of the following four “sub-workflows”:

These four sub-workflows can be run separately or in combination with each other.

Setup and test the pipeline

The pipeline can be run on any Linux operating system and require these three dependencies: Java, nextflow (Di Tommaso et al., 2017) and a software container or environment system such as conda, mamba, singularity, or docker. The pipeline can also be run on both local linux system as well as high performance computing (HPC) clusters. Note that all the software related dependencies of the pipeline will be handled by nextflow after it is installed. Meaning that the user install only the three dependencies listed above, while nextflow automatically downloads the rest of the tools for the analyses. scalepopgen was built and tested on nextflow version 22.10.6.5843, conda version 23.1.0 and singularity version 3.8.6.

To test the pipeline, simply run the following command:

nextflow run scalepopgen.nf -profile mamba,test_genstruct

The output folder will be created here:"…/test_genstruct_out/". The folder will contain interactive plots for PCA, Fst-based NJ tree, IBS-based NJ tree. It will also contains plots for “ADMIXTURE” analyses. These plots can be customized using the yaml file present inside the ". /parameters/plots/ "folder. Description of the inputs and outputs of the test run can be found here.

Creating a yaml file of the parameters

The workflow implement a lot of programs and tools, with the aim of enabling users to perform a wide range of analyses. This also brings with it a large number of parameters that need to be determined for each sub-workflow. In order to make it more easier for user, we developed graphical user interface (GUI). This GUI is available as an executable within the "scp_config_generator" folder within this repo. With GUI you can specify analyses and their options by moving through the tabs of each workflow section marked with the arrow.

Once you select and specify the parameters according to analyses you want to perform, simply click on "File" and "Save as" yml file.

After that run it with the command:

nextflow run scalepopgen.nf -params-file analyses.yml -profile <conda,mamba,singularity,docker> -qs <number of processes>

A great advantage of this workflow are the interactive plots generated using bokeh, which are stored in the output folder of the respective analysis. They provide the user with a graphical interpretation of the results, allowing to immediately get an impression about the genomic patterns of the analyzed samples. As an example, please take a look at the interactive plots created with cattle data in all the different analyzes offered by the workflow.

Extra notes

Note that read me associated with the workflow will be extensively updated in the coming days. Before using the tiles for your publication do not forget to include the attribution. This package comes with the tile from Esri, therefore, these rules must be followed: https://developers.arcgis.com/documentation/mapping-apis-and-services/deployment/basemap-attribution/. The paper to cites have been mentioned in the respective read me documentation.

To do

1). complete the read me for all the sub-workflows.
2). test the signature of selection extensively.
3). add validation for all the features of the workflow.
4). a gui to make parameter files

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github/workflows		.github/workflows
bin		bin
conf		conf
extra		extra
images		images
lib/templates		lib/templates
modules		modules
parameters		parameters
read_me		read_me
scp_config_generator		scp_config_generator
subworkflows		subworkflows
test_files		test_files
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
Dockerfile		Dockerfile
ReadMe.md		ReadMe.md
environment.yml		environment.yml
nextflow.config		nextflow.config
scalepopgen.nf		scalepopgen.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scalepopgen

Table of contents

About scalepopgen

Setup and test the pipeline

Creating a yaml file of the parameters

Extra notes

To do

About

Releases

Packages

Contributors 2

Languages

BioInf2305/scalepopgen_v0

Folders and files

Latest commit

History

Repository files navigation

scalepopgen

Table of contents

About scalepopgen

Setup and test the pipeline

Creating a yaml file of the parameters

Extra notes

To do

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages