Skip to content

Releases: XMaLab/SequencErr

SequencErr Supplementary Data

09 Dec 23:29
Compare
Choose a tag to compare

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing

Supplementary Data and Codes used to generate figures in the manuscript

There is currently no method to precisely measure the errors that occur in the sequencing instrument, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. We propose a novel computational method, SequencErr, to address this challenge by measuring base concordance in the overlapping region between forward and reverse reads. Analysis of 3,777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates >100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and >90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tiles. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and that removal of outlier error-prone tiles improves sequencing accuracy. Our study revealed novel insights into the nature DNA sequencing errors incurred in sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.

SequencErr on St. Jude Cloud

https://platform.stjude.cloud/workflows/sequencerr

USAGE

Please refer to README file(s) for details of usage

SequencErr Supplementary Data

09 Dec 22:39
Compare
Choose a tag to compare

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing

Supplementary Data and Codes used to generate figures in the manuscript

There is currently no method to precisely measure the errors that occur in the sequencing instrument, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. We propose a novel computational method, SequencErr, to address this challenge by measuring base concordance in the overlapping region between forward and reverse reads. Analysis of 3,777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates >100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and >90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tiles. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and that removal of outlier error-prone tiles improves sequencing accuracy. Our study revealed novel insights into the nature DNA sequencing errors incurred in sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.

SequencErr on St. Jude Cloud

https://platform.stjude.cloud/workflows/sequencerr

USAGE

Please refer to README file(s) for details of usage