Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ELIB/DLIB spectral library support #12

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wfondrie
Copy link
Collaborator

@wfondrie wfondrie commented Dec 7, 2020

This pull request adds a module for parsing the ELIB and DLIB spectral libraries, src/ann_solo/sqlite_parsers.py. These are SQLite3 formats from EncyclopeDIA and are defined here. The PR also changes the logging level to INFO.

This module should be easy to expand in the future to also parse BLIB libraries from Bibliospec (as requested in #2).

I'm still working on benchmarking, but it seems good so far.

@wfondrie
Copy link
Collaborator Author

One thing I really envisioned would be useful with this PR is the ability to use Prosit libraries with ANN-SoLo. However, there are a couple of hiccups in doing so:

  1. The web interface currently requires a CSV file specifying for which it should generate spectra.
  2. There is currently no way to annotate peptides as decoys in Prosit. Thus, the dlib file that it returns must be annotated after generation.

Would it be out-of-scope for ANN-SoLo to also contain a few utility functions to prepare a FASTA file for Prosit? For (1), I would propose adding a function to generate this CSV file from a FASTA file, similar to the functionality already provided by EncyclopeDIA. To solve (2), I think there are a couple options:

  1. Add a function that modifies the dlib file to properly indicate decoy peptide spectra.
  2. Add an optional decoy_spectral_library_filename that specifies decoy peptide spectra, implying that spectral_library_filename only defines targets.

What are your thoughts? The CSV and annotating a dlib could alternatively be provided by another package.

@bittremieux
Copy link
Owner

Yes, I totally agree. Prosit compatibility has been on my wish list / TODO list for quite some time.

My preference would be an end-to-end solution. Rather than having some manual steps in between getting a CSV to submit to the Prosit web interface,and then converting the output from there again, it would be nicer if ANN-SoLo has the option to generate a spectral library (and its index) from a FASTA directly using built-in Prosit.

Prosit is available as open-source, so it should be possible. Although it might complicate installation instructions more, and they're already a bit advanced.

@wfondrie
Copy link
Collaborator Author

That is a good goal, but yikes that does complicate installation! Do you know they have a programmatic API for their webserver? That might be an alternative way to go if they do.

Either way, I'll probably make a small separate package to handle these things for now.

@bittremieux bittremieux mentioned this pull request Oct 7, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants