-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FASTA DB requirements #23
Comments
Hi Tobias, Triqler needs decoys to calculate the Q-value. However, the PSMs in the report.tsv output from DIA-NN usually are not mapped to decoy proteins. To circumvent this, DIA-NN can be run with a spectra library that includes shuffled entrapment sequences. To do this, you first add shuffled entrapment sequences to your FASTA file before constructing a spectral library. These shuffled entrapment sequences are basically shuffled amino acid sequences of the proteins in the FASTA file. Alternatively, you could use OpenSwathDecoyGenerator to add decoys to your spectral library, but this method has crashed in a couple of data sets on which I have tried this on. I am not sure why. Hope this clarifies. |
Hmmm...How would I do this when DIA-NN was run in library-free mode? I thought DIA-NN is already using decoys internally, because it outputs a |
The library-free search starts the in silico digestion from a target-only FASTA database. I guess decoy generation happens on peptide or library level. One can write the resulting spectral lib to disc and it contains a column |
Indeed, DIA-NN is already using decoy peptides internally to compute the FDRs. However, these decoy-peptides cannot be printed into the output report.tsv. I am not entirely sure what See |
Well I guess those floats are the scores and evidence values of the corresponding decoy entry. Instead of adding a new line for each decoy, it just denotes how the decoy scored (skipping the details of how the decoy entity is structured). |
Hmm interesting... I thought about that too, but I could not find any information about how to threshold the scoring. Perhaps the same threshold as Mass.Evidence where values between 0.5-1.0 are considered decoys. Perhaps the Decoy.Evidence could be mapped to a binary indicator for the decoy PSM and then the protein belonging to these peptides could be marked as decoys. Let me think about this. Perhaps @MatthewThe can give some more feedback on this? |
Let's ask Vadim what it really contains ;-) I also couldn't find any documentation on this. |
Do it get the suggestion of Clemens correctly: He generates a target + decoy FASTA DB with a specific decoy prefix (50% target + 50% decoy). Runs this through DIA-NN (which generates internally decoys of decoys) only to get explicit reporting? That sounds pretty wild! And if the decoy function uses sequence reversal a decoy of a decoy turns into a target again. |
Hmmm.. seems like it is redundant information. Having a fasta file of 50/50 ratio target-decoy is correct. However, you might need to generate a separate spectral library before running DIA-NN in library-mode. I can't recall if it worked with a FASTA-file without spectral library, but for sure with a spectral library it will work. Hahaha that's just funny :D... However, I'm not sure it works that way when they generate their decoy peptides. |
In your manuscript entitled "Triqler for Protein Summarization of Data from Data-Independent Acquisition Mass Spectrometry" you state that:
"The pipeline generated decoys for FDR calculations, which were discarded after DIA-NN processing. To circumvent the lack of decoys in output for Triqler, we concatenated shuffled entrapment sequences in the FASTA database."
Could you explain what these shuffled entrapment sequences are? Is this something one needs to add if the DIA-NN reports should be useable for triqler?
The text was updated successfully, but these errors were encountered: