Skip to content

Chimera

frederic-mahe edited this page Nov 23, 2014 · 1 revision

Enhancements

Multi-threading for the --uchime_denovo command?

Chimera detection is a necessary but time consuming operation. Multi-threading would improve that situation, but is not implemented yet.

Open Questions

When to perform chimera detection?

  • at the sample level?
  • at the study level (after merging all samples)?
  • at the OTU level (after clustering, on representative sequences)?

Robert Edgar wrote (http://www.drive5.com/usearch/manual/uchime_pool.html):

I recommend that you combine samples for de novo chimera detection. (Previously, I have recommended that PCR runs should be processed separately, but I now realize that it is probably better to combine runs). The reasoning is as follows: the main concern is false negatives (FNs), because undetected chimeras are usually more harmful in a biological analysis than false positives (FPs). FNs occur if a chimera has has read abundance that is greater than or equal one of its parents. Combining runs is likely to increase the abundance of a parent (because the same species occurs in multiple samples), but is unlikely to increase the abundance of a chimera. If the chimera is reproduced in a second sample, the parents must also be present and are likely to be present in higher abundances. So pooling should tend to reduce FNs, and there is no reason I can think of why FPs would tend to increase.

We can test that hypothesis. On a multi-sample study, produce a table of occurrences for amplicons (number of times each amplicon appears in each sample). Perform a chimera detection on the amplicon dataset (all amplicons from all samples) and for each chimera, verify that in each sample its parents where present and more abundant than the chimera.

Chimera detection on OTU representatives could be tested too. The advantage is the vast reduction of the number of amplicons to work with. The potential problem is the lost of sensitivity due to the reduction (less potential parents).