You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you considered a scoring scheme taking the presence of e.g. rRNA genes into account to prioritize bins meeting more of the MiMAG requirements? (https://www.nature.com/articles/nbt.3893/tables/1)
I just ran a mix of short read and long read bins through drep and was surprised that some of the short read bins got a higher score than the matching long read bins. Turns out that the much improved N50 came with a slight increase in the contamination levels (likely overextension) so it can be fixed by changing the weights as short read bins tend to have artificially low contamination scores. However, I think that scoring the presence of rRNA+tRNA genes could be a nice add on to the current model.
Best regards
Rasmus
The text was updated successfully, but these errors were encountered:
Hi,
I have seen the same issue with long-read bins scored lower than short-read-only bins. When adjusting the scores, would you lower the contamination weight or increase the N50 weight? If you have some suggested values that would be great!
Camilla
(I agree that it would be good to score presence of rRNA+tRNA genes - if they are in a long contig or are found to match the genome.)
Thanks for the feedback. I haven't used many long-read bins in my own research so I wasn't aware of this issue, and thanks for bringing it up.
It's a good point about including rRNA / tRNA genes in the scoring algorithm in accordance with MiMAG- I'll look into it for the next dRep version. In the meantime, you could replicate this functionality using the --extra_weight_table option to add to scores based on rRNA/tRNA genes identified using external programs.
Hi
Thanks for a great tool.
Have you considered a scoring scheme taking the presence of e.g. rRNA genes into account to prioritize bins meeting more of the MiMAG requirements? (https://www.nature.com/articles/nbt.3893/tables/1)
I just ran a mix of short read and long read bins through drep and was surprised that some of the short read bins got a higher score than the matching long read bins. Turns out that the much improved N50 came with a slight increase in the contamination levels (likely overextension) so it can be fixed by changing the weights as short read bins tend to have artificially low contamination scores. However, I think that scoring the presence of rRNA+tRNA genes could be a nice add on to the current model.
Best regards
Rasmus
The text was updated successfully, but these errors were encountered: