Mass spectrometry peptide identifications are often stored in software proprietory data formats. The Proteomic Standards Initiative (PSI) has agreed on data formats such as mzIdentML (http://www.psidev.info/mzidentml) and mzTab (http://www.psidev.info/mztab) to enahnce standardized data exchange. The peptide to genome mapping tool PoGo for proteogenomic applications requires a four column input file easiely generated by bioinformaticians from mass spectrometry identificaiton files. Here we provide a FileConverter to extract sample information and peptide identifications from mzIdentML and mzTab files. We utilize the file parsers provided in the ms-data-core-api (https://github.com/PRIDE-Utilities/ms-data-core-api).
A ready to use JAR file can be downloaded from ftp://ftp.sanger.ac.uk/pubs/teams/17/software/FileConverter/
The FileConverter will generate from mzIdentML or mzTab input a four column file containing peptide identifications with annotated post-translational modifications, associated sample or experiment names, number of peptide to spectrum matches (PSMs) and quantitative information. The output file then can be used to map the peptides therein to a reference genome through PoGo (http://www.sanger.ac.uk/science/tools/pogo).
To run the FileConverter enter the following command prompt or unix shell:
java -jar FileConverter.jar PATH/TO/FILE
Positional arguments:
PATH/TO/FILE | Path to the input file in mzid, mzIdentML or mzTab format. |
- Perez-Riverol Y, Uszkoreit J, Sanchez A, Ternent T, Del Toro N, Hermjakob H, Vizcaíno JA, Wang R. (2015). ms-data-core-api: An open-source, metadata-oriented library for computational proteomics. Bioinformatics. 2015 Apr 24
Christoph Schlaffner ([email protected])