-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change krona wrapper to read from summary report instead of read report #777
Comments
Actually, we can just change krona to intepret the kraken report instead and run almost instantaneously. The only slight drawback is that it will report each taxon as having 1 read with weight = no of reads instead. (Krona has a feature to multiply the read count * additional weight variable for each taxon). |
Wait, I've always wondered why krona couldn't run directly off the summary txt file, but you're saying it can. I don't understand the drawback then -- what would the difference be from the current behavior? |
The only difference would be krona reporting n_i reads with weight 1 versus 1 read with n_i weight. |
Visually, the pies would look the same... that seems worthwhile... does it even need a taxonomy db anymore at that point? |
It still does but krona only needs its |
What, informationally, is in the tab file that isn’t already in the summary report file? |
Hm. I kind of like the idea of omitting the According to this, it looks like the taxonomy.tab file is just:
According to the Kraken manual, the report file is:
So to create krona's
|
I'm changing this particular Issue to focus on the new direction discussed here in the comments and will backlog it for some future time. For reference, here was the original thought:
I will separately implement a quick and dirty change that invokes krona from GNU parallel within the same WDL task (instead of a bash for loop) just to improve things for now until we have time for this larger issue. |
In our current invocations in the WDL workflow, we run kraken, and then krona, on a bunch of samples to save on DB staging time and such. Currently, the krona portion is run in a for-loop after kraken completes.
I'm observing that the krona portion sometimes takes longer than the kraken portion (e.g. multiple hours for krona vs. 1 hr for kraken on a lane of a 2500 high output run). This is both dumb and very costly: krona is only utilizing a single core on an instance that is sized very large for kraken's sake.
Edit: this issue will now focus on changing the krona entry point to run off of the kraken summary report file exclusively (with no other inputs, see comments below for details).
The text was updated successfully, but these errors were encountered: