Skip to content

Commit

Permalink
doc improvemnts (#3434)
Browse files Browse the repository at this point in the history
  • Loading branch information
antgonza authored Sep 13, 2024
1 parent 64a0c80 commit 49e4c1a
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 9 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Deployed on September 23rd, 2024
* Initial changes in `qiita_client` to have more accurate variable names: `QIITA_SERVER_CERT` -> `QIITA_ROOTCA_CERT`. Thank you @charles-cowart!
* Added `get_artifact_html_summary` to `qiita_client` to retrieve the summary file of an artifact.
* Re-added github actions to `https://github.com/qiita-spots/qiita_client`.
* `Woltka v0.1.4, paired-end` superseded `Woltka v0.1.4` in `qp-woltka`; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/woltka_pairedend.html). Thank you to @qiyunzhu for the benchmarks!
* `Woltka v0.1.6, paired-end` superseded `Woltka v0.1.6` in `qp-woltka`; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/woltka_pairedend.html). Thank you to @qiyunzhu for the benchmarks!
* Other general fixes, like [#3424](https://github.com/qiita-spots/qiita/pull/3424), [#3425](https://github.com/qiita-spots/qiita/pull/3425).


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ Note that the command produces up to 5 output artifacts based on the aligner and


.. note::
Woltka 0.1.4 only produces per-genome, per-gene and functional profiles as we are moving
Woltka 0.1.6 only produces per-genome, per-gene and functional profiles as we are moving
to Operational Genomic Units (OGUs), which have higher resolution than taxonomic units
for community ecology, and were shown to deliver stronger biological signals in
downstream analyses. For more information please read: `Phylogeny-Aware Analysis of
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ Benchmarks created by Qiyun Zhu (@qiyunzhu) on Aug 1, 2024.
Summary
-------

I tested alternative read pairing schemes in the analysis of shotgun metagenomic sequencing data. Sequencing reads were aligned against a reference microbial genome database as unpaired or paired, with or without singleton and/or discordant alignments suppressed. A series of synthetic datasets were used in the analysis.
I tested alternative read pairing schemes in the analysis of shotgun metagenomic sequencing data. Sequencing reads were aligned against a reference microbial genome database as unpaired or paired. A series of synthetic datasets were used in the analysis.

The results reveal that treating reads as paired is always advantageous over unpaired. Suppressing singleton alignments further increases the accuracy of results, despite the cost of lower mapping rate. Suppressing discordant alignments has no obvious impact on the result. Regardless of accuracy, the downstream community ecology analyses are not obviously impacted by the choice of parameters.
The results reveal that treating reads as paired is always advantageous over unpaired. Regardless of accuracy, the downstream community ecology analyses are not obviously impacted by the choice of parameters.

Therefore, I recommend the general adoption of paired alignments as a standard procedure. I also endorse suppressing singleton and discordant alignments, but note the favor of further tests on whether they may reduce sensitivity with complex communities.
Therefore, I recommend the general adoption of paired alignments as a standard procedure.

Alignment parameters
--------------------

Sequencing data were aligned using Bowtie2 v2.5.1 in the very sensitive mode against the WoL2 database. They were treated as either unpaired or paired-end:
Sequencing data were aligned using Bowtie2 v2.5.1 in the "very sensitive" mode against the WoL2 database. They were treated as either unpaired or paired-end:

- SE: Reads are treated as unpaired (Bowtie2 input: -U merged.fq)
- PE: Reads are treated as paired (Bowtie2 input: -1 fwd.fq, -2 rev.fq)
Expand All @@ -30,11 +30,10 @@ Five synthetic datasets were generated with 25 samples each consisting of random

The results of the five Bowtie2 parameter sets were compared using nine metrics:

Three metrics that only rely on each result.
Two metrics that only rely on each result.

- Mapping rate (%)
- Number of taxa
- Entropy (i.e., Shannon index, but without subsampling)

Six metrics that rely on comparing each result against the ground truth (higher is better):

Expand All @@ -59,4 +58,4 @@ The results revealed:
#. PE outperforms SE in all metrics. Most importantly, it reduces false positive rate (higher precision) while retaining mapping rate. Meanwhile, the sensitivity (recall) of identifying true taxa is not obviously compromised (note the y-axis scale).
#. PE.NU the two additional parameters had minimum effect on the result and make the alignment step faster. This may suggest that the additional parameters are safe to use.

Therefore, I would recommend adopting paired alignment in preference to unpaired alignment. I may suggest no mixing as it has improved accuracy, but the potential adverse effect of lower mapping rate may be further explored before making a compelling recommendation. Although not having a visible effect, no discordance may be added for logical coherency.
Therefore, I would recommend adopting paired alignment in preference to unpaired alignment.

0 comments on commit 49e4c1a

Please sign in to comment.