Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in Output Counts in Genomad #119

Open
F4NG666 opened this issue Aug 29, 2024 · 1 comment
Open

Discrepancy in Output Counts in Genomad #119

F4NG666 opened this issue Aug 29, 2024 · 1 comment

Comments

@F4NG666
Copy link

F4NG666 commented Aug 29, 2024

Hi,

I hope this message finds you well.

I am currently using Genomad for analyzing a dataset of 39,910 sequences. However, I’ve noticed discrepancies in the output files that I need clarification on:

The summary file contains only 38,449 rows.
The taxonomy file generated by the annotation module contains 39,888 rows.
Could you please help me understand why there is a difference in the number of rows between the input sequences and these output files? Specifically, I would like to know where and why the sequences might have been removed or filtered out.

Thank you for your assistance!

Best regards,
Fang

@apcamargo
Copy link
Owner

The summary files should only include sequences classified as viruses (<prefix>_virus_summary.tsv) or plasmids (<prefix>_plasmid_summary.tsv). Sequences not present in the summary were either not classified as viruses or plasmids, or they were classified but didn't pass the post-classification filters. These filters can be disabled by using the --relaxed flag.

The taxonomy file only contains sequences that were assigned to a taxon. Sequences missing from this file did not match any taxonomically-informative markers. If you expected all sequences to match a marker, you can try increasing the search sensitivity (e.g., -s 7), but this will increase execution time and memory usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants