Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TheiaProk_ONT failing raw read screening #157

Closed
emmadoughty opened this issue Aug 21, 2023 · 1 comment
Closed

TheiaProk_ONT failing raw read screening #157

emmadoughty opened this issue Aug 21, 2023 · 1 comment
Assignees
Labels
done This issue has been addressed

Comments

@emmadoughty
Copy link
Contributor

Many samples in the bacterial_training_data_ont dataset in https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Doughty_Sandbox/data fails raw read screening under default TheiaProk_ONT settings. NB results in the current data table have modified parameters to get assemblies.

Under default settings, genome sizes of many of the samples are massively over-estimated, making the estimated genome size much larger than the biggest known bacterial genome. I have had conversations with a couple of people, concluding that this is probably due to the lower read accuracy of ONT data leading to increased kmer discovery, which is not being considered in the genome size estimation.

Suggested fixes: Removing/replacing genome size estimation tool so that good quality data passes screening under default settings OR making recommendations to users about sequencing chemistry/basecallers to be able to run TheiaProk_ONT successfully under default settings

@sage-wright
Copy link
Member

Closed via #164

@sage-wright sage-wright added the done This issue has been addressed label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done This issue has been addressed
Projects
None yet
Development

No branches or pull requests

3 participants