You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason for the discrepancy in counts you observed is that sequences for influenza are no longer submitted at the serotype levels.
Currently, the best method for retrieving H5N1 sequences is to use a query similar to what is shown by @anna-parker above, but we can be more specific and query for the species-level alphainfluenzavirus influenzae instead of the genus:
Search NCBI Virus using taxonomy = alphainfluenzavirus influenzae, then
Thanks, that makes sense. I didn't know that assigning to subtypes was abandoned. Would be useful indeed if the genotype download was possible at some point!
Describe the bug
There seem to be H5N1 sequences that are not returned when querying via the most general H5N1 taxonomy.
When downloading all sequences for H5N1 taxonomy 102793, I get 30833 sequences with H5N1 in the name
When downloading all influenza sequences with taxonomy 197911, I get 64338 sequences that have H5N1 in the name.
This is unexpected. Half of the H5N1 sequences seem to be wrongly classified.
I would expect that when querying for a taxon id, one gets all the sequences in that taxon and in its children.
To Reproduce
Steps to reproduce the behavior:
The text was updated successfully, but these errors were encountered: