Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-t clade_profiles MetaPhlAn2 for DESeq2 #2

Open
mghanbari opened this issue Jul 24, 2017 · 2 comments
Open

-t clade_profiles MetaPhlAn2 for DESeq2 #2

mghanbari opened this issue Jul 24, 2017 · 2 comments

Comments

@mghanbari
Copy link

Hi
In you presentation "Statistical analysis for metagenomic data" on June 6-7, 2016, you have mentioned that

Note: better to use metaphlan2 option:
-t clade_profiles
to generate normalized counts instead of relative abundance

I did so and now I have the results. But the resulted file shows the normalized value for different markers per clade, so how should I get one number per clade for downstream DESeq2 analysis? Should I get an average for markers per clade?

Thanks for the great presentations.

Regards
Mahdi

@lwaldron
Copy link
Member

Dear Mahdi,

Thanks for pointing that out to me, I actually didn't realize that the -t clade_profiles option returned per-marker counts rather than per-clade counts. I am going to update my advice based on the curatedMetagenomicData pipeline and how we've done differential abundance analysis from it. You can see the exact options that curatedMetagenomicData uses on line 45 here, which do not involve the -t clade_profiles option. What I've done then is to multiply divide these % abundances by 100 and multiply by read depth to get a normalized estimate of read counts. See the section "Estimating Absolute Raw Count Data" in the curatedMetagenomicData vignette.

@edoardopasolli and @nsegata, does this make sense to you?

@mghanbari
Copy link
Author

Thank you for your comments. I'll go with your suggestion. I was wondering if you could also include a tutorial in your future presentation about how to control for more than 1 confounding factor. Also, due to increasing number of time-series analysis in microbiome studies, how to analyze this kind of data with DESeq2 package. Although there is an example in DESeq2 vignette, however, your explanation from the micribiome studies point of view would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants