-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpretation of Wochenende Results #98
Comments
Hi @arpit20328 thanks for your interest in Wochenende. The docs are here but you probably found them already: https://github.com/MHH-RCUG/nf_wochenende/wiki/Interpreting-Wochenende-output And to answer your question - use the RPMM basically combines the first two normalizations - The key problem is though that you do not have many reads aligned (max 154). We typically see thousands - 100k+ reads aligned. Why is this ? Maybe your sample is from soil or another biome which does not fit well to our current reference bacteria (mostly tested on clinical metagenomes to date, eg lung samples) ? Maybe there is nothing in your sample, or it is 16S ? This is only for WGS metagenomics. How many reads are you supplying ? Do you have human as the key source of "contamination" , ie. is this a clinical sample ? |
Thanks @colindaven for your reply.. So my data is actually WGS Paired end by illumina. I have previosuly just send a snapshot and not complete dimension of the data. When our FASTQ files were processed by Wochenende, it gave sorted CSV file of 320 rows Yes its a clinical sample...I do not have reads data now...but this is around 6.6 GB of Paired end .fastq.gz files. |
from this file it is around 31 million reads Wochenende has found in our FASTQ data |
Thanks for that - I deleted the comment since it may be sensitive information and likely shouldn't have been there :-). Congrats, it looks like you have some interesting results. The numbers of reads are very interesting, but the distribution of reads along the chromosomes (and for multi-chromosomal orgs, are all chr covered ?) are the real indicators that the species is there, and it is not just a false positive. You can try to plot the information, but this requires an R server/installation which we couldn't easily fit into the Wochenende conda install instructions. Maybe you have an R server, can install the required software and run the plotting, or do the plotting yourself if you prefer based upon our scripts (see the I hope the |
Yes, this is true, but most are human (the |
I see. thanks @colindaven for detailed reply. I will be requiring more of your inputs in comming weeks. We are running a clinical trial here in Mumbai, India. and we feel Wochenende fits for our study.. Great. ! Thanks again have a great weekend...oops i mean great Wochenende... |
No problem. Yes, we're happy to help out where we can. The trial sounds very interesting! :-) |
@colindaven Hi again so for many fastq files in our sample we are not getting score in column of "bacteria_per_human_cell" So as our previous communication... we are taking "RPMM" as our abundance. Can you please describe what is RPMM and what's the formula behind this ? Can you suggest how RPMM could be formulated to get relative or absolute abundance of the species out of 100 ? Thanks |
Hi @arpit20328 Nice job. I don't know why your samples are not sufficient to calculate the bacterial per human cell parameter. Do they now have any/enough human reads mapped ? You're right in that the docs are a bit unspecific on normalization, so I improved and extended them here with examples: https://github.com/MHH-RCUG/nf_wochenende/wiki/Interpreting-Wochenende-output Please let me know if that is sufficient for your needs. I would really recommend using the raspir function to remove false positives, and the plotting functionality to check distribution of reads along 1+ chromosome before you decide if the taxon is present or not. These do require extra work to get running, but at least the Raspir function should work quite well for the nf_wochnenende repo (with Nextflow). It likely will not work with the older Wochenende repo without some hacking to adjust it to your compute environment. Maybe @irosenboom can also supply you with some experience of what constitutes a high or low RPMM, and how these values can be affected by especially short genome lengths etc. cheers |
@colindaven thanks IMPORTANT: I chopped off Human Ref Sequences from wochenende recommended database since I was only interested in Pathogens identification and not human dna.. I think i should have taken complete wochenende database ref fasta. Your thoughts? |
Ah -- please don't do this. You'll get human reads massively and erroneously mapping to bacteria and have many, many false positives. In clinical samples 90-99% of DNA is human in my experience, and needs to be excluded. When using the proper version with the human sequences you'll get very, very different results. |
I have the following output from my paired end FASTQ files. This output came after I ran "bash runbatch_Wochenende_reporting.sh
Can anyone tell me which column represents the abundance value of each species (row in this matrix) ?
The text was updated successfully, but these errors were encountered: