Skip to content

Latest commit

 

History

History
196 lines (121 loc) · 12.1 KB

sep_058.md

File metadata and controls

196 lines (121 loc) · 12.1 KB

SEP 058 -- Aliases

SEP
Title Aliases
Authors Jacob Beal [email protected]
Editor
Type Data Model
SBOL Version 3.2
Status Draft
Created 22-Apr-2023
Last modified
Issue #124

Abstract

Genetic parts often have more than one identifier, either for historical reasons or because there are "nickname" versions of an identifier. This SEP proposes a means of tracking the relationship between the canonical name for a part (or other object) and a set of aliases that can be used to redirect to it from other identities.

Table of Contents

  1. Rationale

Genetic parts often have more than one identifier. Sometimes this is due to "nickname" conventions, e.g., the iGEM part "J23101" is a short form of the less-frequently used "BBa_J23101". Other times, a part may have had one identifier historically, but later been renamed to another identifier. Similar things may also happen with objects other than genetic parts.

Currently, these types of relationships in SBOL are handled via ad hoc tables or special-case code. In other systems of identifiers, however, there is generally some sort of system for linking from alternative identities to a canonical entity (e.g., "aliases", "links", "redirects", "forwarding"). Just as in other systems, adding an alias capability to SBOL will allow tools to more consistently work with objects that have multiple identifiers or that change identifier over time.

  1. Specification

The specification will be modified by adding an Alias class to the specification between TopLevel and Sequence and by allowing any reference to a TopLevel to also refer to an Alias.

Alias

An Alias is a TopLevel object that is used to redirect references to an identity to another, preferred identity.

An Alias MAY be used in place of any other TopLevel object, but SHOULD NOT be if the preferred identity is known. As such, whenever a reference to a TopLevel object is discovered to be an Alias object, that reference SHOULD be updated to the value of the referent property of the Alias.

The referent property

An Alias object MUST have precisely one referent property, which contains the URI for a TopLevel object to which references to the Alias are intended to be redirected.

Any chain of Alias object referent property values MUST NOT be circular, i.e., following referent values MUST ultimately lead to a TopLevel that is not an Alias.

The referent SHOULD point to the most persistent and canonical identity available for the object to which it refers. Complementarily, this implies that the referent SHOULD NOT itself be an Alias.

A referent that is an Alias MAY occur in some circumstances, e.g., in the case where the referent of an Alias has its identity changed, leaving behind a forwarding Alias, but as soon as this is discovered the referent value SHOULD be updated (just like any other TopLevel reference to an Alias).

Modified Validation Rules

sbol3-10112: Each property of type IRI that is listed with a non-TopLevel reference type in Table 23 (i.e., child objects) MUST refer to an object of the type listed.

sbol3-10113: Each property of type IRI that is listed with a TopLevel reference type in Table 23 MUST refer to either an object of the type listed or an object of type Alias with a referent that resolves to an object of the type listed.

Table 22 will include a new row:

Class Parent SBOL properties Reference
Alias TopLevel referent (Alias section)

Table 23 will include a new row:

Class Property Cardinality Type Referred Type Reference
Alias referent EXACTLY ONE IRI TopLevel (Alias section)

New Validation Rules for the Alias Class

(weak REQUIRED) sbol3-10401: An Alias object MUST NOT form a cyclical chain of references via its referent property and those of other Alias objects. Reference: (Alias section)

(RECOMMENDED) sbol3-10402: The referent property of an Alias SHOULD NOT be the IRI for an object of type Alias. Reference: (Alias section)

  1. Example or Use Case

Example: Multiple identities

iGEM promoter BBA_J23101 is best accessed as a Component from SynBioHub with identity https://synbiohub.org/public/igem/BBa_J23101. It can also be referred to on parts.igem.org as either J23101 or BBa_J23101, so two Alias objects are created:

  • Alias identity http://parts.igem.org/J23101 has a referent of https://synbiohub.org/public/igem/BBa_J23101

  • Alias identity http://parts.igem.org/BBa_J23101 has a referent of https://synbiohub.org/public/igem/BBa_J23101

Example: Removing duplicates

In the 2022 iGEM distribution, software limitations meant that many copies of the plasmid pSB1C5 were created: https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5, https://github.com/iGEM-Engineering/iGEM-distribution/Fluorescent%20Reporter%20Proteins/pSB1C5, https://github.com/iGEM-Engineering/iGEM-distribution/RBS%20Collection/pSB1C5, etc.

A vectors package is created with a canonical copy of pSB1C5 with identity https://github.com/iGEM-Vectors/pSB1C5. Each of the duplicates can then be replaced by an Alias with the same identity and a referent of https://github.com/iGEM-Vectors/pSB1C5

Note that tools using one of the 2022 packages will no longer have a copy of pSB1C5 locally in the package document. Thus, if the tool needs to access information about pSB1C5, it will need to locate and download a copy of the https://github.com/iGEM-Vectors/pSB1C5 as indicated by the referent.

Example: Change of identity in a dependency

Consider a Component for a composite part http://myparts.org/MyProject that uses pSB1C5 as a plasmid backbone, refering to this via its child object http://myparts.org/MyProject/SubComponent1. Specifically, the instanceOf property of SubComponent1 points to one of the copies of pSB1C5 in the iGEM 2022 distribution: https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5.

A tool working with http://myparts.org/MyProject needs to retrieve information about pSB1C5 and retrieves https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5 from the iGEM 2022 distribution. At this point, the tool discovers that it has retrieved an Alias with a referent of https://github.com/iGEM-Vectors/pSB1C5, and retrieves that object, obtaining pSB1C5 under its new identity from its new location.

The tool then suggests an update of the instanceOf property of SubComponent1 from https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5 to https://github.com/iGEM-Vectors/pSB1C5, so that it once again points to a Component rather than an Alias.

  1. Backwards Compatibility

Aliases are purely an addition to the data model, so all previously valid SBOL3 documents will remain valid after the adoption of aliases.

If a document makes use of aliases, however, it will pose problems for SBOL 3.1 tools that are not alias-aware:

  • Alias objects will be judged invalid SBOL 3.1, since they contain objects with a previously unknown type in the SBOL namespace
  • Objects that reference an alias will be judged invalid SBOL 3.1, because they point to an alias instead of an object of the expected type.
  • Tools will behave erroneously when encountering an alias where they expect the object that the alias is standing in for.

For these reasons, in to make aliases work, it will be important to have a "resolve aliases" utility that:

  1. replaces references to aliases with references to the alias' referent, and

  2. optionally removes all aliases from the document.

  3. Discussion


The proposed means of representing aliases is only one of a number of possible alternatives. Here are other alternatives that were considered, and their advantages and disadvantages.

Alternative: Raw Triple

Rather than creating an object for an alias, it would be possible to represent an alias with a single triple, e.g., Alias isAliasOf Referent

While this would be more compact, it would depart from the object model that is used for all other SBOL representations. Moreover, it would not allow commentary or provenance information to be added to aliases.

Alternative: Alias object has two properties: referent and referred

In this approach, the reference stays the same, but we add a "backward-pointing" link that has cardinality [1..n]. Under this approach, you need to have a collection of aliases to query, and when looking up a TopLevel, if you do not find the TopLevel, then query the collection of aliases, to see if there is a referred property with that URI, which is then redirected to the referent to look up.

This approach has a validation problem, because you can put the same object in the referred field of multiple different Alias objects, with different referent value. If you have those Alias object in the same document, then you could detect this as a validation error, but this means that you could have conflicting redirects that are not detectable (i.e., database 1 says J23101 should be retrieved from SynBioHub, while database 2 says J23101 is a totally different sequence in NCBI.

Alternative: Alias property on TopLevel

Rather than making an alias a separate object, it could instead be represented as a new hasAlias property on TopLevel. This would also be compact, and in this case, the alias would remain bundled with the TopLevel that it references, e.g., Referent hasAlias Alias.

With this structure, however, an alias cannot be looked up unless one already knows the referent. This defeats one of the key purposes of aliases, allowing forwarding of references.

One variation of how this could be used is in fact to have the alias object be of the same type as its referent, e.g., J23101 is a Component of type DNA with referent BBa_J23101. This has the problem that J23101 is then a fork of BBa_J23101, and can contain conflicting information.

Alternative: Aliases for child objects

Aliases might also be used for child objects. For all cases besides ComponentReference, however, references to child objects happen only within their containing TopLevel object. Moreover, with ComponentReference a refersTo property outside of the containing TopLevel is always coupled with the other TopLevel vis the inChildOf property.

Thus, any redirection that would be needed for a child object can always be obtained via the corresponding redirect to its containing TopLevel.

  1. Competing SEPs

None

Copyright

CC0
To the extent possible under law, SBOL developers has waived all copyright and related or neighboring rights to SEP 058. This work is published from: United States.