-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameter to exclude unbinned data from post-binning steps #708
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea!
I've tweaked the phrasing of the parameter and given more guidance to the parameter name.
Please update the CHANGELOG and I think then we might be good to go.
The only thing check is that the bin_summary steps don't fail for some reason because some contigs are missing for some reason
Co-authored-by: James A. Fellows Yates <[email protected]>
Thank you for the review! I believe everything should be good to go now. As for a possible bin summary failure, I don’t think it will case issues, because DEPTHS and every other bin summary input uses the same channel. I did a test with a few samples and didn't have any problem. On that note, I think it should be worth considering making bin summary more flexible, allowing it to handle cases where not all inputs have identical bins. So, if a QC bin fails, we could generate the table with a warning rather than an error. This should prevent issues like #603, but obviously, if a QC process fails the user should have to look into it. However, this could be a discussion to have in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @dialvarezs !
This pr adds a new parameter
exclude_unbins_from_postbinning
, with default valuefalse
.The reasoning behind this is that as stated in #649, Prokka can take a really long time with unbinned data, and also qc metrics and taxonomic classification normally are not meaningful for non-bins.
With the default set to
false
, this parameter doesn't introduce any breaking change to the current workflow, but adds flexibility to focus on binned data on post-binning steps.This PR also fixes one minor issue I found in line 827 of
mag.nf
, where the channel passed toGUNC
wasch_input_bins_for_qc
instead ofch_input_bins_for_gunc
(the same, but without eukarya domain).Closes #649 .
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).