Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy locations break document validity #200

Open
jakebeal opened this issue Mar 26, 2023 · 6 comments
Open

Fuzzy locations break document validity #200

jakebeal opened this issue Mar 26, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@jakebeal
Copy link
Contributor

The current implementation of the SBOL/Genbank converter in sbol3_genbank_conversion.py uses a custom class Location_GenBank_Extension to represent GenBank's fuzzy location field.

Unfortunately, the current implementation breaks document validity, because it tags the objects with type http://sbols.org/v3#locationPosition, which is not a valid SBOL type. This causes any SBOL library not already adjusted to accept the extension (e.g., pySBOL3) to deem the document invalid (and possibly also crash during loading, depending on implementation).

Changing the intended type for the location, http://www.ncbi.nlm.nih.gov/genbank#locationPosition, does not fix the problem either, because now there is a SequenceFeature whose hasLocation property points to something that the library doesn't believe is an SBOL Location.

@jakebeal jakebeal added the bug Something isn't working label Mar 26, 2023
@jakebeal
Copy link
Contributor Author

jakebeal commented Mar 26, 2023

A workaround for now will be to simply not use this class: instead, use the same behavior of the old converter, which just truncated fuzzy ranges into sbol:Range objects.

Some possible better resolutions for the future:

@jakebeal
Copy link
Contributor Author

On deeper inspection, it looks like we're already partway to solution via the section approach, since the offending SequenceFeature objects are linked via NCBI:fuzzyFeatures. So if the SequenceFeature was changed to an extension class NCBI:fuzzySequenceFeature, that would be sufficient to interrupt the invalidity.

jakebeal added a commit that referenced this issue Mar 26, 2023
…s be 1) add a non-standard sbol#locationPosition type, and 2) reference to an unrecognized Location from a SequenceFeature.

Temporarily address this by reverting to the prior behavior of truncating fuzzy ranges into ranges.
@tcmitchell
Copy link
Collaborator

Is this possible a job for an extension? Could this work by extending Location or one of its children using the pySBOL3 CustomIdentified class? See https://pysbol3.readthedocs.io/en/stable/extensions.html#example-2-extend-a-core-class for more info.

@jakebeal
Copy link
Contributor Author

Unfortunately, I don't currently see how to do it:

  • The fuzzy range semantics are not compatible with any of the children. Range is closest, but will fail because there are validation rules based on the length of a range, which requires exact start and end. This is why we didn't extend Range in the first place.
  • Because Location is an abstract class, it's not clear whether any library that isn't using the extension can meaningfully instantiate a generic Location. In pySBOL3, for example, there is no builder registered for Location, so extending Location results in errors when working with a document with pySBOL3 (i.e., without the extension loaded).

@tcmitchell
Copy link
Collaborator

That's unfortunate. I'm glad you explored the possibility. I have opened a pySBOL3 issue to understand this limitation better: SynBioDex/pySBOL3#427

@jakebeal
Copy link
Contributor Author

Patch has been merged; issue will stay open for a better resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants