Skip to content

Latest commit

 

History

History
73 lines (43 loc) · 2.53 KB

README.md

File metadata and controls

73 lines (43 loc) · 2.53 KB

EMseq nextflow pipeline

This pipeline is designed to process raw data from EM-seq experiments. It is based on the nf-core framework.

Quick Start

  1. Install nextflow

Ensure also on your system you have installed fastqc, multiqc, trim galore and bismark.

  1. Install fastqc
  2. Install multiqc
  3. Install trim galore
  4. Install bismark

Clone repo

git clone [`EMseq pipeline`](https://github.com/Ephantus-Wambui/EMseq_nextflow_pipeline.git)

Run pipeline

Before running the pipeline ensure you have the following files in the data directory:

  1. genome_test directory which contains TMEB117_chr16.fasta reference genome

  2. high yield and low yield fastq files

After ensuring everything is in place, activate the conda environment which contains fastqc, multiqc, trim galore and bismark dependencies.

conda activate EMseq # activate conda environment

Before running the EMseq_pipeline.nf script, first run the QC pipeline to check the quality of the fastq file.

** Note: cd into scripts folder **

nextflow run EMseq_fastqc.nf

After running the QC pipeline, run the EMseq pipeline to align the reads to the reference genome and generate methylation calls.

nextflow run EMseq_pipeline.nf

** Note: Adjust trim galore parameters according to the fastqc results and then run the pipeline. **

Output

The pipeline will generate the following files directories in the output directory:

  1. Both high yield and low yield directories which will contain individual fastqc reports of the fastq files.

  2. Both high yield and low yield directories which will contain multiqc reports of the fastqc reports.

  3. Both high yield and low yield directories which will contain trimmed fastq files.

  4. Both high yield and low yield directories which will contain bismark alignment reports.

  5. Both high yield and low yield directories which will contain bismark methylation calls.

Pipeline summary

  1. QC pipeline: This pipeline will generate fastqc reports of the fastq files and a multiqc report of the fastqc reports.

  2. EMseq pipeline: This pipeline will align the reads to the reference genome and generate methylation calls.

Contributors

  1. Ephantus Wambui