Skip to content

lconde-ucl/DGE2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DGE2

Nextflow run with conda run with docker run with singularity Launch on Nextflow Tower

Introduction

DGE2 is a nextflow pipeline built using code and infrastructure developed and maintained by the nf-core initative. It was developed to perform differential gene expression analysis after the data has been preprocessed with the nf-core/rnaseq pipeline (v3+) with default star_salmon alignment.

  1. Takes salmon quantification files and a metadata file as input
  2. Performs differential gene expression analysis over a specific design or if one is not specified, over all possible designs from the metadata file
  3. Generates summary plots (PCA, volcano, heatmap) and txt files, as well as a summary HTML report
  4. Runs gene set enrichment analysis on the preRanked list of genes from the DGE results

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

If you have run the nf-core/rnaseq pipeline with default aligner (star/salmon), you should have a results/star_salmon/ folder with several additional folders and files, including a quant.sf file for each sample, plus a tx2gene.tsv file with the correspondence between transcript and gene identifiers:

results/star_salmon/SAMPLE_1/quant.sf
results/star_salmon/SAMPLE_2/quant.sf
results/star_salmon/SAMPLE_3/quant.sf
results/star_salmon/SAMPLE_4/quant.sf
results/star_salmon/SAMPLE_5/quant.sf
results/star_salmon/SAMPLE_6/quant.sf
results/star_salmon/tx2gene.tsv
[... other files and folders...]

In the above example, you would pass the results/ folder to the DGE2 pipeline using the --inputdir argument

Additionally, you will need to prepare a metadata.txt file that looks as follows:

SampleID	Levels  Status
SAMPLE_1	high  ctr
SAMPLE_2	high  ctr
SAMPLE_3	med  ctr
SAMPLE_4	low  case
SAMPLE_5	low  case
SAMPLE_6	low  case

This should be a txt file where the first column are the sample IDs, and the other (1 or more) columns displays the conditions for each sample. The samples must match those in the results/star_salmon inputdir.

Now, you can run the pipeline using:

nextflow run lconde-ucl/DGE2 \
   -profile <docker/singularity/.../institute> \
   --inputdir <PATH/TO/INPUTDIR/> \
   --metadata <PATH/TO/METADATA> \
   --outdir <OUTDIR>

For more details and further functionality, please refer to the usage documentation

Pipeline output

The pipeline produces text files and plots with the DGE and GSEA results, as well as an HTML report that contains a summary of the DGE results. For more details about the output files and reports, please refer to the output documentation.

Credits

DGE2 was developed by Lucia Conde in 2024. This is a DSL2 version of an older (DSL1) DGE pipeline developed in 2019

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

This pipeline uses code and infrastructure developed and maintained by the nf-core initative, and reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Additional references of tools and data used in this pipeline are in CITATIONS

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published