This is an early version of POREquality, an R Markdown script designed to run as part of a Nanopore local basecalling pipeline. POREquality reads Nanopore sequencing summary files to generate an aesthetically pleasing HTML report to faciliate the visualization of key metrics.
- Reasons to use POREquality:
- Produce professional sequencing reports after any locally basecalled MinION or GridION run.
- Visually inspect information contained in the sequencing summary.
- Sharing sequencing quality control reports with third-parties.
- Diagnose problematic or under-performing runs.
POREquality has currently only been tested on Ubuntu, although provided the dependencies are met it (in theory) should be able to run on other operating systems. POREquality requires pandoc to be installed, which we recommend you do via your package manager. Currently these R packages are required:
- data.table
- flexdashboard
- dplyr
- plyr
- ggplot2
- knitr
- optparse
- RColorBrewer
- reshape2
required.packages <- c("data.table","flexdashboard","dplyr","plyr","ggplot2","knitr","optparse","RColorBrewer","reshape2")
new.packages <- required.packages[!(required.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
git clone https://github.com/carsweshau/POREquality
cd POREquality
sudo apt-get install pandoc
The (rather boorish) bash code below could be placed in a script and ran via cron:
NUMBER_OF_ACTIVE_RUNS=$(ps -ef | grep MinKNOW | grep experiment | grep sequencing | grep -v \"grep\" | wc -l)
if [ $NUMBER_OF_ACTIVE_RUNS -gt 0 ]; then
exit 1 # files are still being written, will check later via cron
fi
cd /data/basecalled # assumes GridION data structure
for run in *; do
if [[ -f $run ]]; then
continue;
fi
if [[ $run != "workspace" ]]; then
if [ ! -f ${dir}_summary.txt ]; then
cat ${dir}/GA?0000/seq*.txt > ${dir}_raw_summary.txt # creating an intermediate file is distasteful here, you could grep off a header and append to your liking
awk ' /^filename/ && FNR > 1 {next} {print $0} ' ${dir}_raw_summary.txt > ${dir}_summary.txt && rm /data/basecalled/${dir}_raw_summary.txt
fi
if [ ! -f /data/reports/${run}.html ]; then
Rscript -e "rmarkdown::render('/home/USER/POREquality/POREquality.Rmd', output_file=paste('/data/reports/${run}.html',sep=''))" -i /data/basecalled/${run}_summary.txt -o /data/reports
fi
fi
done
Alternatively, one could just run the Rscript supplying the required sequencing summary.
- Ensure the new re-factored code accepts any ONT sequencing summary gracefully
- Add PromethION support (physical flowcell layout, ensure compatiable with existing workflows, etc)
- Simplify installation of POREquality via dependency management like Packrat
- Add in bream log support for interrogation of drift voltages, etc.
- Refactor R code to use fewer packages and embrace data.table to enable key-value/set operations for performance
As this is my first release, I would greatly appreciate any feedback to improve POREquality! I welcome the Nanopore community to offer insight and to contribute to the ongoing development of POREquality by either submitting issues issues or pull requests.
I would like to thank Dr. Martin Smith for the patient encouragment, as well as the rest of the Genomic Technologies Group, Dr. Kirston Barton and James Ferguson for all their hard work and advice.
Furthermore, the wider Nanopore community is a fantastic and welcoming place, and there are many aspects of POREquality which could not exist were it not for the hard work of many others providing this environment.