Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MzIdentML Converter Modifications #77

Open
sureshhewabi opened this issue Sep 16, 2024 · 8 comments
Open

MzIdentML Converter Modifications #77

sureshhewabi opened this issue Sep 16, 2024 · 8 comments
Labels
CrossLinkingValidationLib Changes related with Crosslinking validations

Comments

@sureshhewabi
Copy link
Collaborator

sureshhewabi commented Sep 16, 2024

We need to make this library supported by command-line options for each functionality including:

1. Validation of crosslinking mzIdentML (mzID) files. #78
2. Command-line support #80
3. Data generation for PDBDev reports. #79

@sureshhewabi sureshhewabi added the CrossLinkingValidationLib Changes related with Crosslinking validations label Sep 16, 2024
@colin-combe
Copy link

Hi @sureshhewabi, @aozalevsky, @ypriverol

we already have command line support in https://github.com/PRIDE-Archive/xi-mzidentml-converter/blob/python3/parser/process_dataset.py using the standard python library argparse.

Perhaps click is better, it has a section in the documentation about why it is not based on argparse.

Anyway, there are multiple solutions for making the command line interface. We can use click if it seems the best.

I think validation will consist mainly of running the parser and seeing if it works or not. But it will need to be modified so it doesn't try to write stuff anywhere. Also, we can improve its error messages so we know why it failed.

Lets think more about 'Data Generation for PDBDev reports':

  • what information do you want to get back and in what format?
  • how do you want to call it? - is this a case of using it like a library, i.e. a dependency of the code that generates the reports, rather than calling it on command line? (Is the code that generates the reports in python?)

cheers,
C

@colin-combe
Copy link

Also, IMP may have a need to extract crosslinking data from mzIdentML files? This might be related?

FYI, our converter is based on the pyteomics library. It adds a way of getting crosslink info from whats returned from pyteomics, it's not a 'from scratch' implementation of mzIdentML parsing.

@aozalevsky
Copy link

@colin-combe Ideally, i'd like to get an output similar to the current API output. Basically, we need sequences (some ID + sequence) + residues pairs. Keeping the JSON formatted output would be nice, too.

Calling (import + call) as a library would be ideal, but making a subprocess CLI call is also acceptable.

@colin-combe
Copy link

Calling (import + call) as a library would be ideal

yes, i think that's better. And you were totally right with what you said in meeting about there being several benefits to it being like this (not just a way of addressing the private submission questions). It was never deliberately not a library.

Anyway, i'll take a look at this next week,
cheers,
C

@ypriverol
Copy link

Validation is the priority, and then the data structure and the JSON for PDBDev reports. We have to test the validation in the command line and create some documentation for users who want to start testing their dataset files. @sureshhewabi probably would be good to have an issue alone and link to this one.

@sureshhewabi
Copy link
Collaborator Author

Thanks everyone. As we discussed on the meeting yesterday, let's create separate issues for separate task and then delegate task among us. Also we can keep this as the main Issue that link other task so we can track the progress.

@sureshhewabi sureshhewabi changed the title Command-line support for the feature MzIdentML Converter Modifications Sep 19, 2024
@aozalevsky
Copy link

Also, IMP may have a need to extract crosslinking data from mzIdentML files? This might be related?

I had a chat with Ben, the main IMP developer in our lab. He agreed it would be a neat addition to the current functionality (dealing with csv/xls lists).

@colin-combe
Copy link

i updated #79 and #78 to reflect status of version in PR #84

any comments on how to better organise/structure the main process_dataset.py file are welcome. (or just general python style stuff)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CrossLinkingValidationLib Changes related with Crosslinking validations
Projects
None yet
Development

No branches or pull requests

4 participants