-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to DADA2 >= 1.7.3 #85
Comments
@benjjneb, can you let us know which bioconductor release cycle you expect this to be available in, that way we can track it in the appropriate QIIME 2 release cycle. Thanks! |
This won't get to release until the next Bioconductor release (BioC 3.7) which will be around April. Given that there will also be a delay in the BioC release propagating to bioconda, this would make sense for a June/July Q2 release. |
Still waiting on the May Bioconductor release to propagate to bioconda. I think progress will show up on this issue: bioconda/bioconda-recipes#8947 Once it migrates, I'm planning to update the R scripts, and to address several other outstanding issues at the same time including #87, #93, #97, #99 And note to self: Add an extra step catching and removing very low depth sequences before learning the error rates. |
@benjjneb, in the new granular pipeline scheme, would it be possible to provide a way to handle sample pooling? This came up recently in the context of
We're going to be trying to address TypeMap and similar typing related things in the next release cycle. If I need to implement a way to dependently type the output, that should be possible (especially with a real use-case now). |
Yes. Although given the frustrating speed penalty that bioconda-dada2 has, pseudo-pooling might make more sense.
Have to admit, I don't really follow this. Basically option to a different return type if pool=TRUE (or pool=psuedo)? |
Yep! Basically we'd just figure out when that property can be added, and tools that need it can require it as an input. Everything else will continue not caring like normal. |
Bioconda is now at 1.10: http://bioconda.github.io/recipes/bioconductor-dada2/README.html @benjjneb Is the speed penalty still there? This blocks qiime2/q2-alignment#64 |
Interesting. It would actually be easier to just jump ahead to 1.10 as the edge case merging bugs were mostly worked out between 1.8 and 1.10.
It will be a little better than now, but as far as I coiuld tell the speed penalty is pretty hard to overcome completely, because it is a result of how bioconda compiles code (old version of gcc, low levels of optimization to handle generic hardware). |
I wonder if the recent bump to gcc7 on conda-forge and bioconda will help us out here... |
That's what I meant. If it's not Having The reason why I can't easily build a SINA version that matches the pinned libs is that we followed Conda-Forge in the move to GCC7. Since that brought the ABI change required to support C++11, all C++ libs except for libstdc++ are incompatible with the old ones. |
Didn't realize this had happened! I will definitely re-profile the speed issue with the 1.10 bioconda builds. fingers crossed would be awesome if the performance delta narrows. |
It's still happening I suppose. We are rebuilding more or less on an as-needed-basis because of resource constraints. Conda-Forge had been rebuilding for a while, and this January did "the label switch". That means all the GCC7 package previously in |
@epruesse A brief update, we have started getting segfault error reports for the dada2 package that appear to be coming from conda-installed versions of the 1.10 package, i.e. the new package version that is being built with GCC7. See for example: benjjneb/dada2#684 |
I've just run some initial tests on the bioconda version of DADA2 1.10 built with GCC7, and it appears that the speed penalty versus natively installed DADA2 is now essentially gone. I will do some more detailed testing, but this would be a major quality-of-life improvement for folks using DADA2 via QIIME2 (up to a 10x speedup) and strongly suggests skipping the 1.8 package version and going straight to 1.10. Is that in the realm of possibility for the next Q2 release? |
@benjjneb Did you solve the segfault error or it didn't happen with your samples? |
Yep! There's still a full month or so on the next release. The dates are also probably something we can flex if it comes down to it, the speed improvement would certainly be worth it. |
Works fine in my initial testing, but I haven't tried to reproduce the reported bioconda-specific segfault yet. Will try soon.
OK, I am going to move forward with testing and updating the Q2 scripts based on package version 1.10 then. I'll need help from your side figuring out the Q2 conda recipes and any python updates to the plugin though. |
Sounds like a plan! Feel free to structure the R scripts into as many or few as you need. I can adapt the Python side to work correctly. I think we had talked about breaking up the Re segfault: Is it possible that it's just a blanket runtime conflict? e.g. glibc on CentOS 5 vs anything compiled in the last decade? |
Just wanted to ping this thread with the DADA2 issue that is open on segfaults with bioconda-dada2-1.10: benjjneb/dada2#684 In testing on my local machine, the bioconda install of 1.10 seems fine, but clearly there is something going on here because there are multiple people with the same report. |
It might be not just 1.10. (segfault in 1.8) bioconda/bioconda-recipes#13847 In 1.8, it appears to have been a libc++ issue? I can't tell. We need someone proficient in R for this. |
Ok updates - my best guess is a concurrency bug somewhere in |
As noted here, when that new release lands, we should update the pinned version here, as well as revert the SSE changes made in January 2018 in the wake of SSE-gate.
The text was updated successfully, but these errors were encountered: