-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silva database? #5
Comments
Great, I have tried so far, see shot below. The taxonomy is two columns , fist column is accession and seconds columns are taxonomy as follows: 129138 Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonas amygdali What is wrong ? there are two columns in the taxonomy file. |
Taxa should be tab-separated, as shown in Example 4.
|
If you need to map the Silver TaxIDs to the created ones.
|
Great !!! |
Sure, please follow the usage and examples. |
I have the following problem. For a particular id (1824050977) i don´t have a corresponding mapping (see shot). However it is in names.dmp. The taxid map was generated the above post with option "-A 1" Why it is missing ? |
It just maps custom IDs to TaxIDs of the taxa of the lowest rank, e.g. species. If you need to map to taxa of other ranks, like the genus Staphylococcus. Here's the way, csvtk is needed.
|
Great, In this case for example the 609216830 that is a Streptoccus is matched to Bacteria. I would expect an assignation to the genus family/genus level at least. |
So, mapping only IDs to taxa of species rank is reasonable. Back to the previous concern, why did you want map IDs to the genus Staphylococcus?
Mapping to species is enough, cause the complete lineage of the species can be retrieved with the taxid of the species. Could you please show how you query the LCA? What were the TaxIDs used? What's the direct purpose? |
Yes sure, So for each read i use LCA to assign the upper common hit from the 10 hits, so for one read i get one hit. Once i get that hit i would like to resolve the taxonomy (phylum,class...order.....species) For some reads the resolution (because of the error rate) cannot be achieved at the species level, that´s why in some cases(like the Staphyloccocus case above) the best hit (output from LCA) is a genus. But could be family or order or class or phyla. Does this makes sense? |
Good, here you've assigned LCA to each read.
No, you can just use taxonkit lineage or taxonkit reformat -I to retrieve lineage via the LCAs, no matter what the rank they are. There's no need to query with the |
Thanks for the update, there is one strange behaviour for lca. I was expecting LCA to go up to the staphylococcus genus level but it went up to bacilli. I assume something is wrong with my database. Here is the pipeline to build the database:
Don´t what is wrong... |
The LCA seems to be 1845768359 (Bacilli), without a doubt. BTW, the process could be simplified.
|
Hi,
Thanks for the gtdb-taxdump. I´m working on Silva , is there any available tax dump ?
The text was updated successfully, but these errors were encountered: