Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we support the OPAL format as output? #113

Open
Midnighter opened this issue Jul 2, 2023 · 10 comments
Open

Should we support the OPAL format as output? #113

Midnighter opened this issue Jul 2, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@Midnighter
Copy link
Contributor

In order to use the OPAL tool for analysis and visualization, it might be useful to convert any supported profiler to that format.

@Midnighter Midnighter added the enhancement New feature or request label Jul 2, 2023
@mattheatley
Copy link

might be reproducing taxonkit profile2cami? https://bioinf.shenwei.me/taxonkit/usage/#profile2cami

@jfy133
Copy link
Contributor

jfy133 commented Nov 17, 2023

Some of those commands look really nice, didn't know of that tool thanks @mattheatley ! Agreed, maybe not necesary if it's supported elsewhere? Should consider if our output is compariable as input to profile2cami though

@mattheatley
Copy link

mattheatley commented Nov 17, 2023

so you're pretty much already there with the standard taxpasta output. but instead of the two columns (taxid & count) you'd need to provide taxid & abundance (i.e. percentage) and then that's the input required by taxonkit. tbh it's probably more useful to also have the counts in general so maybe just provide the abundances as an extra output

@jfy133
Copy link
Contributor

jfy133 commented Nov 17, 2023

     b) Abundance (could be percentage, automatically detected or use -p/--percentage).

Raw counts are 'sequence' abunadance anyway, so maybe we are already there then?

@jfy133
Copy link
Contributor

jfy133 commented Nov 17, 2023

But could be good to test if the two tools are comaptible, then we could update the docs to point people to taxonkit :)

@mattheatley
Copy link

I think maybe they are talking about proportion vs percentage and not counts but not totally sure

@Midnighter
Copy link
Contributor Author

I have been considering an option to report fractions instead of counts from taxpasta for quite some time. So it seems that small change would already make the output compatible with taxonkit's profile2cami.

@mattheatley
Copy link

Maybe don’t do away with counts altogether though? I actually find it more useful to have them instead because you can’t convert backwards to counts from abundances. An additional output would be great though. At the moment I convert taxpasta outputs to abundances and then to cami so this would cut out a stage. But there can be rounding issues so maybe calculate them using decimals?

@Midnighter
Copy link
Contributor Author

By the way, @mattheatley, I don't know if this is clear enough from the documentation: Some of the original profiler output is actually given as fractions, which we multiply with a big number in order to obtain integers. So in those cases, it would be more faithful to the original result to only report fractions.

@paulzierep
Copy link

One major issue atm is, that only leaf counts are supported by taxonkit shenwei356/taxonkit#99 (comment)
If this is fixed I think it would work seamless with taxpasta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants