FASTA DB requirements #23

tobiasko · 2023-04-18T10:44:42Z

In your manuscript entitled "Triqler for Protein Summarization of Data from Data-Independent Acquisition Mass Spectrometry" you state that:

"The pipeline generated decoys for FDR calculations, which were discarded after DIA-NN processing. To circumvent the lack of decoys in output for Triqler, we concatenated shuffled entrapment sequences in the FASTA database."

Could you explain what these shuffled entrapment sequences are? Is this something one needs to add if the DIA-NN reports should be useable for triqler?

patruong · 2023-04-18T12:13:49Z

Hi Tobias,

Triqler needs decoys to calculate the Q-value. However, the PSMs in the report.tsv output from DIA-NN usually are not mapped to decoy proteins. To circumvent this, DIA-NN can be run with a spectra library that includes shuffled entrapment sequences. To do this, you first add shuffled entrapment sequences to your FASTA file before constructing a spectral library. These shuffled entrapment sequences are basically shuffled amino acid sequences of the proteins in the FASTA file.

Alternatively, you could use OpenSwathDecoyGenerator to add decoys to your spectral library, but this method has crashed in a couple of data sets on which I have tried this on. I am not sure why.

Hope this clarifies.

tobiasko · 2023-04-18T12:34:57Z

Hmmm...How would I do this when DIA-NN was run in library-free mode? I thought DIA-NN is already using decoys internally, because it outputs a Decoy.Evidence and Decoy.CScore for each feature in the main report. This can't be used by triqler?

tobiasko · 2023-04-18T13:25:23Z

The library-free search starts the in silico digestion from a target-only FASTA database. I guess decoy generation happens on peptide or library level. One can write the resulting spectral lib to disc and it contains a column Decoy. I hence guess the lib is supplemented with decoy targets/transitions.

patruong · 2023-04-18T13:27:20Z

Indeed, DIA-NN is already using decoy peptides internally to compute the FDRs. However, these decoy-peptides cannot be printed into the output report.tsv.

I am not entirely sure what Decoy.Evidence and Decoy.CScore are used for, but they are floats and Triqler denotes if they are decoys or not by parsing the prefix to a protein, i.e. a binary indicator.

See
DIA-NN generated decoy peptides: vdemichev/DiaNN#6
DIA-NN cannot generate the internally generated decoys as decoy proteins as output: vdemichev/DiaNN#117
DIA-NN cannot generate the internally generated decoy peptides: vdemichev/DiaNN#468

tobiasko · 2023-04-18T13:33:21Z

Well I guess those floats are the scores and evidence values of the corresponding decoy entry. Instead of adding a new line for each decoy, it just denotes how the decoy scored (skipping the details of how the decoy entity is structured).

patruong · 2023-04-18T13:46:29Z

Hmm interesting... I thought about that too, but I could not find any information about how to threshold the scoring. Perhaps the same threshold as Mass.Evidence where values between 0.5-1.0 are considered decoys. Perhaps the Decoy.Evidence could be mapped to a binary indicator for the decoy PSM and then the protein belonging to these peptides could be marked as decoys. Let me think about this. Perhaps @MatthewThe can give some more feedback on this?

tobiasko · 2023-04-18T13:53:03Z

Let's ask Vadim what it really contains ;-) I also couldn't find any documentation on this.

tobiasko · 2023-04-18T13:57:57Z

Do it get the suggestion of Clemens correctly: He generates a target + decoy FASTA DB with a specific decoy prefix (50% target + 50% decoy). Runs this through DIA-NN (which generates internally decoys of decoys) only to get explicit reporting? That sounds pretty wild! And if the decoy function uses sequence reversal a decoy of a decoy turns into a target again.

patruong · 2023-04-18T14:07:10Z

Hmmm.. seems like it is redundant information.

Having a fasta file of 50/50 ratio target-decoy is correct. However, you might need to generate a separate spectral library before running DIA-NN in library-mode. I can't recall if it worked with a FASTA-file without spectral library, but for sure with a spectral library it will work.

Hahaha that's just funny :D... However, I'm not sure it works that way when they generate their decoy peptides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FASTA DB requirements #23

FASTA DB requirements #23

tobiasko commented Apr 18, 2023

patruong commented Apr 18, 2023

tobiasko commented Apr 18, 2023 •

edited

Loading

tobiasko commented Apr 18, 2023 •

edited

Loading

patruong commented Apr 18, 2023

tobiasko commented Apr 18, 2023 •

edited

Loading

patruong commented Apr 18, 2023

tobiasko commented Apr 18, 2023

tobiasko commented Apr 18, 2023 •

edited

Loading

patruong commented Apr 18, 2023

FASTA DB requirements #23

FASTA DB requirements #23

Comments

tobiasko commented Apr 18, 2023

patruong commented Apr 18, 2023

tobiasko commented Apr 18, 2023 • edited Loading

tobiasko commented Apr 18, 2023 • edited Loading

patruong commented Apr 18, 2023

tobiasko commented Apr 18, 2023 • edited Loading

patruong commented Apr 18, 2023

tobiasko commented Apr 18, 2023

tobiasko commented Apr 18, 2023 • edited Loading

patruong commented Apr 18, 2023

tobiasko commented Apr 18, 2023 •

edited

Loading

tobiasko commented Apr 18, 2023 •

edited

Loading

tobiasko commented Apr 18, 2023 •

edited

Loading

tobiasko commented Apr 18, 2023 •

edited

Loading