This pipeline is designed to process raw data from EM-seq experiments. It is based on the nf-core framework.
- Install
nextflow
Ensure also on your system you have installed fastqc, multiqc, trim galore and bismark.
- Install
fastqc
- Install
multiqc
- Install
trim galore
- Install
bismark
git clone [`EMseq pipeline`](https://github.com/Ephantus-Wambui/EMseq_nextflow_pipeline.git)
Before running the pipeline ensure you have the following files in the data
directory:
-
genome_test directory which contains TMEB117_chr16.fasta reference genome
-
high yield and low yield fastq files
After ensuring everything is in place, activate the conda environment which contains fastqc, multiqc, trim galore and bismark dependencies.
conda activate EMseq # activate conda environment
Before running the EMseq_pipeline.nf script, first run the QC pipeline to check the quality of the fastq file.
** Note: cd into scripts folder **
nextflow run EMseq_fastqc.nf
After running the QC pipeline, run the EMseq pipeline to align the reads to the reference genome and generate methylation calls.
nextflow run EMseq_pipeline.nf
** Note: Adjust trim galore parameters according to the fastqc results and then run the pipeline. **
The pipeline will generate the following files directories in the output directory:
-
Both high yield and low yield directories which will contain individual fastqc reports of the fastq files.
-
Both high yield and low yield directories which will contain multiqc reports of the fastqc reports.
-
Both high yield and low yield directories which will contain trimmed fastq files.
-
Both high yield and low yield directories which will contain bismark alignment reports.
-
Both high yield and low yield directories which will contain bismark methylation calls.
-
QC pipeline: This pipeline will generate fastqc reports of the fastq files and a multiqc report of the fastqc reports.
-
EMseq pipeline: This pipeline will align the reads to the reference genome and generate methylation calls.