EMseq nextflow pipeline

This pipeline is designed to process raw data from EM-seq experiments. It is based on the nf-core framework.

Quick Start

Ensure also on your system you have installed fastqc, multiqc, trim galore and bismark.

git clone [`EMseq pipeline`](https://github.com/Ephantus-Wambui/EMseq_nextflow_pipeline.git)

Before running the pipeline ensure you have the following files in the data directory:

After ensuring everything is in place, activate the conda environment which contains fastqc, multiqc, trim galore and bismark dependencies.

conda activate EMseq # activate conda environment

Before running the EMseq_pipeline.nf script, first run the QC pipeline to check the quality of the fastq file.

** Note: cd into scripts folder **

nextflow run EMseq_fastqc.nf

After running the QC pipeline, run the EMseq pipeline to align the reads to the reference genome and generate methylation calls.

nextflow run EMseq_pipeline.nf

** Note: Adjust trim galore parameters according to the fastqc results and then run the pipeline. **

The pipeline will generate the following files directories in the output directory:

Both high yield and low yield directories which will contain individual fastqc reports of the fastq files.
Both high yield and low yield directories which will contain multiqc reports of the fastqc reports.
Both high yield and low yield directories which will contain trimmed fastq files.
Both high yield and low yield directories which will contain bismark alignment reports.
Both high yield and low yield directories which will contain bismark methylation calls.

QC pipeline: This pipeline will generate fastqc reports of the fastq files and a multiqc report of the fastqc reports.
EMseq pipeline: This pipeline will align the reads to the reference genome and generate methylation calls.