Skip to content

Commit

Permalink
Condition report
Browse files Browse the repository at this point in the history
* Rename parameter

* Add output file type

* case class hygiene

* Add output file to config

* Write summary file

* Update existing tests

* Test for summary writer

* Add file to command line parser

* Update readme and manual

* Set version to 3.11.0-SNAPSHOT

* Update changelog
  • Loading branch information
mtomko authored Feb 13, 2024
1 parent 0bcc8a5 commit 2261927
Show file tree
Hide file tree
Showing 12 changed files with 213 additions and 42 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## 3.11.0
* Machine-parseable condition barcode summary file

## 3.10.0
* More efficient and memory-safe sampling technique for unexpected sequence reporting

Expand Down
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ other information that can be used to troubleshoot experiments. These include ma
locations, matching correlations between barcodes, and lists of frequently-occurring unknown barcodes.

## Documentation

For information on how to run PoolQ and its various modes and options, please see the
[manual](docs/MANUAL.md). We also maintain a [changelog](CHANGELOG.md) listing updates made to PoolQ.

Expand All @@ -41,13 +40,13 @@ associated licenses.
PoolQ was completely rewritten for version 3. The new code is faster and the codebase is much cleaner
and more maintainable. We have taken the opportunity to make other changes to PoolQ as well.

- There are substantial changes to the command-line interface for the program.
- The default counts file format has changed slightly, although there is a command-line
argument that indicates that PoolQ 3 should write a backwards-compatible counts file. The differences
are in headers only; file parsers should be able to adapt easily.
- The quality file has changed somewhat. Importantly, the definition of certain statistics has changed
slightly, so quality metrics cannot be directly compared between the the new and old versions. In addition,
we no longer provide normalized match counts.
* There are substantial changes to the command-line interface for the program.
* The default counts file format has changed slightly, although there is a command-line
argument that indicates that PoolQ 3 should write a backwards-compatible counts file. The differences
are in headers only; file parsers should be able to adapt easily.
* The quality file has changed somewhat. Importantly, the definition of certain statistics has changed
slightly, so quality metrics cannot be directly compared between the the new and old versions. In addition,
we no longer provide normalized match counts.

See the [manual](docs/MANUAL.md) for complete details on the differences versions 2 and 3.

Expand Down
9 changes: 5 additions & 4 deletions docs/MANUAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

PoolQ is a counter for indexed samples from next-gen sequencing of pooled DNA.

_This documentation covers PoolQ version 3.10.0 (last updated 02/12/2024)._
_This documentation covers PoolQ version 3.11.0 (last updated 02/13/2024)._

## Background

Expand Down Expand Up @@ -559,7 +559,7 @@ PoolQ you will need a Java 8 JDK. You can download an appropriate JRE or JDK fro
You can download PoolQ from an as yet undetermined location. The file you download is a ZIP file
that you will need to unzip. In most cases, this is as simple as right-clicking on the zip file, and
selecting something like "extract contents" from the popup menu. This will create a new folder on
your computer named `poolq-3.10.0`, with the following contents:
your computer named `poolq-3.11.0`, with the following contents:

- `poolq3.jar`
- `poolq3.bat`
Expand Down Expand Up @@ -610,7 +610,7 @@ You can run PoolQ from any Windows, Mac, or Linux machine, but it requires some
how to launch programs from the command line on your given operating system.

1. Open a terminal window for your operating system
2. Change directories to the `poolq-3.10.0` directory
2. Change directories to the `poolq-3.11.0` directory

- On Windows, run:

Expand All @@ -627,7 +627,7 @@ how to launch programs from the command line on your given operating system.
If you successfully launched PoolQ, you should see a usage message explaining all of the
command-line options:

poolq3 3.10.0
poolq3 3.11.0
Usage: poolq [options]

--row-reference <file> reference file for row barcodes (i.e., constructs)
Expand All @@ -652,6 +652,7 @@ command-line options:
--umi-counts-dir <file>
--umi-barcode-counts-dir <file>
--quality <file>
--condition-barcode-counts-summary <file>
--counts <file>
--normalized-counts <file>
--barcode-counts <file>
Expand Down
19 changes: 17 additions & 2 deletions src/main/scala/org/broadinstitute/gpp/poolq3/PoolQ.scala
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ import org.broadinstitute.gpp.poolq3.reports.{
}
import org.broadinstitute.gpp.poolq3.types.{
BarcodeCountsFileType,
ConditionBarcodeCountsSummaryFileType,
CountsFileType,
LogNormalizedCountsFileType,
OutputFileType,
Expand All @@ -49,7 +50,14 @@ object PoolQ {
private[this] val log: Logger = getLogger

private[this] val AlwaysWrittenFiles: Set[OutputFileType] =
Set(CountsFileType, QualityFileType, LogNormalizedCountsFileType, BarcodeCountsFileType, RunInfoFileType)
Set(
CountsFileType,
QualityFileType,
ConditionBarcodeCountsSummaryFileType,
LogNormalizedCountsFileType,
BarcodeCountsFileType,
RunInfoFileType
)

final def main(args: Array[String]): Unit =
PoolQConfig.parse(args) match {
Expand Down Expand Up @@ -169,7 +177,14 @@ object PoolQ {
config.reportsDialect
)
_ = log.info(s"Writing quality file ${config.output.qualityFile}")
_ <- QualityWriter.write(config.output.qualityFile, state, rowReference, colReference, config.isPairedEnd)
_ <- QualityWriter.write(
config.output.qualityFile,
config.output.conditionBarcodeCountsSummaryFile,
state,
rowReference,
colReference,
config.isPairedEnd
)
_ <- umiInfo.fold(().pure[Try])(_ => UmiQualityWriter.write(config.output.umiQualityFile, state))
_ = log.info(s"Writing log-normalized counts file ${config.output.normalizedCountsFile}")
normalizedCounts = LogNormalizedCountsWriter.logNormalizedCounts(counts, rowReference, colReference)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ final case class PoolQOutput(
normalizedCountsFile: Path = Paths.get("lognormalized-counts.txt"),
barcodeCountsFile: Path = Paths.get("barcode-counts.txt"),
qualityFile: Path = Paths.get("quality.txt"),
conditionBarcodeCountsSummaryFile: Path = Paths.get("condition-barcode-counts-summary.txt"),
correlationFile: Path = Paths.get("correlation.txt"),
unexpectedSequencesFile: Path = Paths.get("unexpected-sequences.txt"),
umiQualityFile: Path = Paths.get("umi-quality.txt"),
Expand Down Expand Up @@ -253,6 +254,11 @@ object PoolQConfig {
val _ =
opt[Path]("quality").valueName("<file>").action((f, c) => c.copy(output = c.output.copy(qualityFile = f)))

val _ =
opt[Path]("condition-barcode-counts-summary")
.valueName("<file>")
.action((f, c) => c.copy(output = c.output.copy(conditionBarcodeCountsSummaryFile = f)))

val _ = opt[Path]("counts").valueName("<file>").action((f, c) => c.copy(output = c.output.copy(countsFile = f)))

val _ = opt[Path]("normalized-counts").valueName("<file>").action { (f, c) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,39 @@ import org.broadinstitute.gpp.poolq3.reference.Reference

object QualityWriter {

class TeeWriter(w1: PrintWriter, w2: PrintWriter) {

def print(s: String): Unit = {
w1.print(s)
w2.print(s)
}

def println(s: String): Unit = {
w1.println(s)
w2.println(s)
}

def println(): Unit = {
w1.println()
w2.println()
}

}

def write(
file: Path,
qualityFile: Path,
conditionBarcodeCountsSummaryFile: Path,
state: State,
rowReference: Reference,
colReference: Reference,
isPairedEnd: Boolean
): Try[Unit] =
Using(new PrintWriter(file.toFile)) { writer =>
val barcodeLocationStats =
if (isPairedEnd) {
s"""Reads with no construct barcode: ${state.rowBarcodeNotFound + state.revRowBarcodeNotFound - state.neitherRowBarcodeFound}
Try {
Using.resources(new PrintWriter(qualityFile.toFile), new PrintWriter(conditionBarcodeCountsSummaryFile.toFile)) {
case (qualityWriter, cbcsWriter) =>
val barcodeLocationStats =
if (isPairedEnd) {
s"""Reads with no construct barcode: ${state.rowBarcodeNotFound + state.revRowBarcodeNotFound - state.neitherRowBarcodeFound}
|
|Reads with no forward construct barcode: ${state.rowBarcodeNotFound}
|Max forward construct barcode index: ${state.rowBarcodeStats.maxPosStr}
Expand All @@ -38,15 +60,15 @@ object QualityWriter {
|Min reverse construct barcode index: ${state.revRowBarcodeStats.minPosStr}
|Avg reverse construct barcode index: ${decOptFmt(state.revRowBarcodeStats.avg)}""".stripMargin

} else {
s"""Reads with no construct barcode: ${state.rowBarcodeNotFound}
} else {
s"""Reads with no construct barcode: ${state.rowBarcodeNotFound}
|Max construct barcode index: ${state.rowBarcodeStats.maxPosStr}
|Min construct barcode index: ${state.rowBarcodeStats.minPosStr}
|Avg construct barcode index: ${decOptFmt(state.rowBarcodeStats.avg)}""".stripMargin
}
}

val header =
s"""Total reads: ${state.reads}
val header =
s"""Total reads: ${state.reads}
|Matching reads: ${state.matches}
|1-base mismatch reads: ${state.matches - state.exactMatches}
|
Expand All @@ -55,25 +77,29 @@ object QualityWriter {
|$barcodeLocationStats
|""".stripMargin

writer.println(header)
qualityWriter.println(header)

writer.println(s"Read counts for sample barcodes with associated conditions:")
writer.println(
s"Barcode\tCondition\tMatched (Construct+Sample Barcode)\tMatched Sample Barcode\t% Match\tNormalized Match"
)
colReference.allBarcodes.foreach { colBarcode =>
val data = perBarcodeQualityData(state, rowReference, colReference, colBarcode)
writer.println(data.mkString("\t"))
}
qualityWriter.println(s"Read counts for sample barcodes with associated conditions:")

// use a TeeWriter for the next section of the report
val tw = new TeeWriter(qualityWriter, cbcsWriter)
tw.println(
s"Barcode\tCondition\tMatched (Construct+Sample Barcode)\tMatched Sample Barcode\t% Match\tNormalized Match"
)
colReference.allBarcodes.foreach { colBarcode =>
val data = perBarcodeQualityData(state, rowReference, colReference, colBarcode)
tw.println(data.mkString("\t"))
}

writer.println()
writer.println("Read counts for most common sample barcodes without associated conditions:")
val unepectedBarcodeFrequencies =
state.unknownCol.keys.map(barcode => BarcodeFrequency(barcode, state.unknownCol.count(barcode))).toSeq
topN(unepectedBarcodeFrequencies, 100).foreach { case BarcodeFrequency(barcode, count) =>
writer.println(barcode + "\t" + count.toString)
qualityWriter.println()
qualityWriter.println("Read counts for most common sample barcodes without associated conditions:")
val unepectedBarcodeFrequencies =
state.unknownCol.keys.map(barcode => BarcodeFrequency(barcode, state.unknownCol.count(barcode))).toSeq
topN(unepectedBarcodeFrequencies, 100).foreach { case BarcodeFrequency(barcode, count) =>
qualityWriter.println(barcode + "\t" + count.toString)
}
qualityWriter.println()
}
writer.println()
}

private[this] def decOptFmt(d: Option[Double]): String = d.map(Decimal00Format.format).getOrElse("N/A")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ package org.broadinstitute.gpp.poolq3.types
trait OutputFileType extends Product with Serializable
case object CountsFileType extends OutputFileType
case object QualityFileType extends OutputFileType
case object ConditionBarcodeCountsSummaryFileType extends OutputFileType
case object LogNormalizedCountsFileType extends OutputFileType
case object BarcodeCountsFileType extends OutputFileType
case object CorrelationFileType extends OutputFileType
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
*/
package org.broadinstitute.gpp.poolq3.types

case class PoolQSummary(runSummary: PoolQRunSummary, outputFiles: Set[OutputFileType])
final case class PoolQSummary(runSummary: PoolQRunSummary, outputFiles: Set[OutputFileType])
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class UnlabeledConditionsTest extends CatsEffectSuite with TestResources {
barcodeCountsFile <- tempFile[IO]("barcode-counts", ".txt")
normalizedCountsFile <- tempFile[IO]("normcounts", ".txt")
qualityFile <- tempFile[IO]("quality", ".txt")
conditionBarcodeCountsSummaryFile <- tempFile[IO]("condition-barcode-counts-summary", ".txt")
correlationFile <- tempFile[IO]("correlation", ".txt")
unexpectedSequencesFile <- tempFile[IO]("unexpected", ".txt")
runInfoFile <- tempFile[IO]("runinfo", ".txt")
Expand All @@ -32,6 +33,7 @@ class UnlabeledConditionsTest extends CatsEffectSuite with TestResources {
normalizedCountsFile = normalizedCountsFile,
barcodeCountsFile = barcodeCountsFile,
qualityFile = qualityFile,
conditionBarcodeCountsSummaryFile = conditionBarcodeCountsSummaryFile,
correlationFile = correlationFile,
unexpectedSequencesFile = unexpectedSequencesFile,
runInfoFile = runInfoFile
Expand Down
Loading

0 comments on commit 2261927

Please sign in to comment.