Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete taxonomic paths in .phyloFlash.NTUfull_abundance.csv output #184

Open
chassenr opened this issue Jun 20, 2023 · 3 comments
Open

Comments

@chassenr
Copy link

Hi @HRGV and @kbseah ,

I have been using phyloFalsh to explore the eukaryotic component of some metagenomes and noticed that the taxonomic paths in the .phyloFlash.NTUfull_abundance.csv output table are all truncated to 7 levels, which is not sufficient for eukaryotes. In my particular example, I am interested in the taxonomic composition of Chytridiomycota (fungi), but the taxonomic path is not further resolved beyond this level (phylum). Is there a quick fix for this that I can implement myself? Are you planning to change this in upcoming phyloFlash versions? I know that eukaryotic taxonomic paths are a nightmare (especially if you want to align them with prokaryotic ones), but maybe the tax_slv_ssu_138.1.txt file will be helpful to pick a corresponding set of taxonomic ranks for both prokaryotes and eukaryotes in the output?

Thanks!

Cheers,
Christiane

@kbseah
Copy link
Contributor

kbseah commented Jun 20, 2023

Hi Christiane, thanks for pointing this out. As you note this is a tricky issue because of the longer taxonomic paths for eukaryotic paths and their inconsistent lengths in the SILVA taxonomy (and the NCBI taxonomy too).

One possibility I see is to use the PR2 taxonomy paths instead, which are standardized to 9 levels: https://pr2database.github.io/pr2database/articles/pr2_02A_silva.html

I haven't checked though what fraction of the SILVA eukaryotic sequences also appear in PR2. Some groups may not be represented in PR2 because they rely on expert curation for specific taxonomic groups.

Can't make any promises about when a new phyloFlash version will come out. As a stop-gap we could work on a SILVA database with modified taxonomy paths. Will keep this in mind

@chassenr
Copy link
Author

Hi @kbseah Thanks for your fast reply. Is there maybe a way to work with the existing phyloflash output and maybe just parse the sam file differently to create the NTU table with the complete paths (independent of phyloflash)? Just as a quick fix? I tried to identify the corresponding code in the perl scripts, but since I am not a perl person that was a bit difficult for me...

@kbseah
Copy link
Contributor

kbseah commented Jun 23, 2023

Hi Christiane, I think the best option for now is to simply parse the SAM file. They contain the SILVA accessions and header lines, which include the taxonomy paths, which you can the summarize at the level you wish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants