-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaxQuant Input #10
Comments
We just added some converters for MaxQuant: https://github.com/statisticalbiotechnology/triqler/wiki/Converters It uses a MaxQuant output file called evidence.txt and it's best to raise the FDR threshold to include more decoys. We have run some tests ourselves, but it's still a bit experimental so it would be interesting to see your results! |
Thank you for your quick reply. I did see the converter for MaxQuant, which is very helpful. But the problem is that the protein names in my evidence file never contain the prefix My question now is: is there a specific setting in MaxQuant that I need to turn on to get those decoy peptides? If not, I do find some proteins that have the prefix
|
That's correct, the REV__ proteins correspond to the decoys and you can use the command you suggested. The only thing is that at the default FDR of 1% you will get very few decoys, so we recommend setting the FDR threshold to e.g. 100%. This will both allow triqler to potentially use more peptides and also results in better calibration of the error estimates. |
Thank you very much, that information is very helpful already. And a second question, what options are there for normalizing the intensities between samples? The example data that I am trying to analyze has a fairly large effect to one side (1/3 of all proteins are upregulated in one condtion), which causes a bias in the remaining proteins. Is there any option in triqler to correct for this if I want to identify differentially abundant proteins? |
You have to set both the PSM and peptide level FDR to 100%. EDIT: Apparently one also has to set the protein level FDR to 100% to get all the decoys in MaxQuant. There is currently only one option for normalization, which assumes that the majority of proteins do not change between samples. In your case, this might indeed not be ideal and it would be wise to add the |
Thanks, that is good to know.
Yes, I agree that proper normalization is quite challenging, especially if the effect is so large in one direction. |
I just ran some tests and it seems that you also need to set the protein level FDR to 100% to get all the decoys in MaxQuant, sorry for the inconvenience.. Regarding the normalization, in the same spirit as your approach, it would be possible to implement a function that only uses peptides that do not vary "too much" for normalization, e.g. less than log2 fold changes, where can be set as a parameter. Do you think that is a reasonable approach? |
Thank you again for helping me getting the program running :)
Yes, if there is a set for which we a priori know that they shouldn't differ between samples than one should use those peptides for the normalization. I have now managed to run MaxQuant with the right settings and also do the downstream analysis with triqler. |
Alright, I'll look into this alternative normalization procedure, is it correct that you're currently skipping the normalization in the converter? I also realized that the set you're looking at is fractionated, which would probably completely mess up our current normalization procedure anyway. Our Also, are you using matches-between-runs? So far we have tested without MBR, as we expect there to be extra errors coming from this process which are not yet included in Triqler. |
Yes, exactly.
Okay, thanks that is good to know.
Thank you for the correction. I will go through those papers again to make sure that I don't confuse the terms. Though, the FDR problem persists.
I run MaxQuant with MBR, but filter out all rows in the evidence file that were quantified with MBR and not directly measured, because I got parsing errors on the complete file. But thank you for the link to your paper, I will also add that one to my reading list :) |
Alright, that all sounds good! I'm still working on the alternative normalization procedure. Very interesting to see your results, how far off are the FDR estimates? Would you be able to share your Triqler log file and perhaps even the Triqler input/output? We have previously had some problems estimating hyperparameters on mixed species datasets, due to the bi/multimodality of the distributions. |
Great. Well, quite a bit but that is what I observe for all tools if the data is not normalized. Here is the log file from triqler (slurm.bx06-17.39308207.log) which I ran with the following command:
For the input / output I will try to see if I find a good platform to share, because they are quite big (evidence: 1.1 GB, triqler_input: 265 MB, triqler_result: 15 MB). |
I uploaded the files to the owncloud of my institute: https://oc.embl.de/index.php/s/Ag9smCQYuyR8DDS |
Thanks a lot! I downloaded the files and will take a look tomorrow. From the logs I already noticed that something is going wrong with calculating the peptide and protein identification error probabilities. Also, |
Okay interesting. Thanks for letting me know.
From the documentation, I thought that it would be a filter on the fold change and everything with a smaller fold change would be ignored. Is that wrong? I agree that for a biological analysis it can make sense to ignore significant proteins with a very small fold change, but for the benchmarking it should be fine to keep them. |
The I looked at your data and there are indeed some issues that cause Triqler some problems:
|
Hi Matthew,
I am currently trying to run triqler on some existing dataset (PXD000279) for a benchmarking experiment.
My problem is that I don't manage to get the input into a format that is acceptable to triqler. My biggest problem is that, as far as I understand, triqler needs the peptides from decoy proteins to estimate the error rate. But I cannot figure out how to make MaxQuant output the decoy hits. I do see that MaxQuant provides a column called
Reverse
, is that one equivalent?Best Regards, Constantin
The text was updated successfully, but these errors were encountered: