Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drep to consider rRNA genes? #115

Open
Kirk3gaard opened this issue Apr 7, 2021 · 2 comments
Open

drep to consider rRNA genes? #115

Kirk3gaard opened this issue Apr 7, 2021 · 2 comments

Comments

@Kirk3gaard
Copy link

Hi

Thanks for a great tool.

Have you considered a scoring scheme taking the presence of e.g. rRNA genes into account to prioritize bins meeting more of the MiMAG requirements? (https://www.nature.com/articles/nbt.3893/tables/1)

I just ran a mix of short read and long read bins through drep and was surprised that some of the short read bins got a higher score than the matching long read bins. Turns out that the much improved N50 came with a slight increase in the contamination levels (likely overextension) so it can be fixed by changing the weights as short read bins tend to have artificially low contamination scores. However, I think that scoring the presence of rRNA+tRNA genes could be a nice add on to the current model.

Best regards
Rasmus

@camillaln
Copy link

Hi,
I have seen the same issue with long-read bins scored lower than short-read-only bins. When adjusting the scores, would you lower the contamination weight or increase the N50 weight? If you have some suggested values that would be great!
Camilla
(I agree that it would be good to score presence of rRNA+tRNA genes - if they are in a long contig or are found to match the genome.)

@MrOlm
Copy link
Owner

MrOlm commented Apr 7, 2021

Hi Rasmus and Camilla,

Thanks for the feedback. I haven't used many long-read bins in my own research so I wasn't aware of this issue, and thanks for bringing it up.

It's a good point about including rRNA / tRNA genes in the scoring algorithm in accordance with MiMAG- I'll look into it for the next dRep version. In the meantime, you could replicate this functionality using the --extra_weight_table option to add to scores based on rRNA/tRNA genes identified using external programs.

Best,
Matt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants