SEP | |
---|---|
Title | Aliases |
Authors | Jacob Beal [email protected] |
Editor | |
Type | Data Model |
SBOL Version | 3.2 |
Status | Draft |
Created | 22-Apr-2023 |
Last modified | |
Issue | #124 |
Genetic parts often have more than one identifier, either for historical reasons or because there are "nickname" versions of an identifier. This SEP proposes a means of tracking the relationship between the canonical name for a part (or other object) and a set of aliases that can be used to redirect to it from other identities.
- 1. Rationale
- 2. Specification
- 3. Example or Use Case
- 4. Backwards Compatibility
- 5. Discussion
- 6. Competing SEPs
- References
- Copyright
Genetic parts often have more than one identifier. Sometimes this is due to "nickname" conventions, e.g., the iGEM part "J23101" is a short form of the less-frequently used "BBa_J23101". Other times, a part may have had one identifier historically, but later been renamed to another identifier. Similar things may also happen with objects other than genetic parts.
Currently, these types of relationships in SBOL are handled via ad hoc tables or special-case code. In other systems of identifiers, however, there is generally some sort of system for linking from alternative identities to a canonical entity (e.g., "aliases", "links", "redirects", "forwarding"). Just as in other systems, adding an alias capability to SBOL will allow tools to more consistently work with objects that have multiple identifiers or that change identifier over time.
The specification will be modified by adding an Alias
class to the specification between TopLevel
and Sequence
and by allowing any reference to a TopLevel
to also refer to an Alias
.
An Alias
is a TopLevel
object that is used to redirect references to an identity to another, preferred identity.
An Alias
MAY be used in place of any other TopLevel
object, but SHOULD NOT be if the preferred identity is known. As such, whenever a reference to a TopLevel
object is discovered to be an Alias
object, that reference SHOULD be updated to the value of the referent
property of the Alias
.
An Alias
object MUST have precisely one referent
property, which contains the URI
for a TopLevel
object to which references to the Alias
are intended to be redirected.
Any chain of Alias
object referent
property values MUST NOT be circular, i.e., following referent
values MUST ultimately lead to a TopLevel
that is not an Alias
.
The referent
SHOULD point to the most persistent and canonical identity available for the object to which it refers. Complementarily, this implies that the referent
SHOULD NOT itself be an Alias
.
A referent
that is an Alias
MAY occur in some circumstances, e.g., in the case where the referent
of an Alias
has its identity changed, leaving behind a forwarding Alias
, but as soon as this is discovered the referent
value SHOULD be updated (just like any other TopLevel
reference to an Alias
).
sbol3-10112: Each property of type IRI that is listed with a non-TopLevel
reference type in Table 23 (i.e., child objects) MUST refer to an object of the type listed.
sbol3-10113: Each property of type IRI that is listed with a TopLevel
reference type in Table 23 MUST refer to either an object of the type listed or an object of type Alias
with a referent that resolves to an object of the type listed.
Table 22 will include a new row:
Class | Parent | SBOL properties | Reference |
---|---|---|---|
Alias | TopLevel | referent | (Alias section) |
Table 23 will include a new row:
Class | Property | Cardinality | Type | Referred Type | Reference |
---|---|---|---|---|---|
Alias | referent | EXACTLY ONE | IRI | TopLevel | (Alias section) |
(weak REQUIRED) sbol3-10401: An Alias
object MUST NOT form a cyclical chain of references via its referent
property and those of other Alias
objects.
Reference: (Alias section)
(RECOMMENDED) sbol3-10402: The referent
property of an Alias
SHOULD NOT be the IRI for an object of type Alias
.
Reference: (Alias section)
iGEM promoter BBA_J23101 is best accessed as a Component
from SynBioHub with identity https://synbiohub.org/public/igem/BBa_J23101
.
It can also be referred to on parts.igem.org
as either J23101
or BBa_J23101
, so two Alias
objects are created:
-
Alias
identityhttp://parts.igem.org/J23101
has areferent
ofhttps://synbiohub.org/public/igem/BBa_J23101
-
Alias
identityhttp://parts.igem.org/BBa_J23101
has areferent
ofhttps://synbiohub.org/public/igem/BBa_J23101
In the 2022 iGEM distribution, software limitations meant that many copies of the plasmid pSB1C5 were created: https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5
, https://github.com/iGEM-Engineering/iGEM-distribution/Fluorescent%20Reporter%20Proteins/pSB1C5
, https://github.com/iGEM-Engineering/iGEM-distribution/RBS%20Collection/pSB1C5
, etc.
A vectors package is created with a canonical copy of pSB1C5 with identity https://github.com/iGEM-Vectors/pSB1C5
. Each of the duplicates can then be replaced by an Alias
with the same identity and a referent
of https://github.com/iGEM-Vectors/pSB1C5
Note that tools using one of the 2022 packages will no longer have a copy of pSB1C5 locally in the package document. Thus, if the tool needs to access information about pSB1C5, it will need to locate and download a copy of the https://github.com/iGEM-Vectors/pSB1C5
as indicated by the referent
.
Consider a Component
for a composite part http://myparts.org/MyProject
that uses pSB1C5 as a plasmid backbone, refering to this via its child object http://myparts.org/MyProject/SubComponent1
. Specifically, the instanceOf
property of SubComponent1
points to one of the copies of pSB1C5 in the iGEM 2022 distribution: https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5
.
A tool working with http://myparts.org/MyProject
needs to retrieve information about pSB1C5 and retrieves https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5
from the iGEM 2022 distribution. At this point, the tool discovers that it has retrieved an Alias
with a referent
of https://github.com/iGEM-Vectors/pSB1C5
, and retrieves that object, obtaining pSB1C5 under its new identity from its new location.
The tool then suggests an update of the instanceOf
property of SubComponent1
from https://github.com/iGEM-Engineering/iGEM-distribution/Anderson%20Promoters/pSB1C5
to https://github.com/iGEM-Vectors/pSB1C5
, so that it once again points to a Component
rather than an Alias
.
Aliases are purely an addition to the data model, so all previously valid SBOL3 documents will remain valid after the adoption of aliases.
If a document makes use of aliases, however, it will pose problems for SBOL 3.1 tools that are not alias-aware:
- Alias objects will be judged invalid SBOL 3.1, since they contain objects with a previously unknown type in the SBOL namespace
- Objects that reference an alias will be judged invalid SBOL 3.1, because they point to an alias instead of an object of the expected type.
- Tools will behave erroneously when encountering an alias where they expect the object that the alias is standing in for.
For these reasons, in to make aliases work, it will be important to have a "resolve aliases" utility that:
-
replaces references to aliases with references to the alias' referent, and
-
optionally removes all aliases from the document.
The proposed means of representing aliases is only one of a number of possible alternatives. Here are other alternatives that were considered, and their advantages and disadvantages.
Rather than creating an object for an alias, it would be possible to represent an alias with a single triple, e.g., Alias isAliasOf Referent
While this would be more compact, it would depart from the object model that is used for all other SBOL representations. Moreover, it would not allow commentary or provenance information to be added to aliases.
In this approach, the reference stays the same, but we add a "backward-pointing" link that has cardinality [1..n]
. Under this approach, you need to have a collection of aliases to query, and when looking up a TopLevel, if you do not find the TopLevel, then query the collection of aliases, to see if there is a referred
property with that URI, which is then redirected to the referent
to look up.
This approach has a validation problem, because you can put the same object in the referred
field of multiple different Alias objects, with different referent
value. If you have those Alias object in the same document, then you could detect this as a validation error, but this means that you could have conflicting redirects that are not detectable (i.e., database 1 says J23101 should be retrieved from SynBioHub, while database 2 says J23101 is a totally different sequence in NCBI.
Rather than making an alias a separate object, it could instead be represented as a new hasAlias
property on TopLevel
.
This would also be compact, and in this case, the alias would remain bundled with the TopLevel that it references, e.g., Referent hasAlias Alias
.
With this structure, however, an alias cannot be looked up unless one already knows the referent. This defeats one of the key purposes of aliases, allowing forwarding of references.
One variation of how this could be used is in fact to have the alias object be of the same type as its referent, e.g., J23101 is a Component of type DNA with referent BBa_J23101. This has the problem that J23101 is then a fork of BBa_J23101, and can contain conflicting information.
Aliases might also be used for child objects. For all cases besides ComponentReference
, however, references to child objects happen only within their containing TopLevel
object. Moreover, with ComponentReference
a refersTo
property outside of the containing TopLevel
is always coupled with the other TopLevel
vis the inChildOf
property.
Thus, any redirection that would be needed for a child object can always be obtained via the corresponding redirect to its containing TopLevel
.
None
To the extent possible under law,
SBOL developers
has waived all copyright and related or neighboring rights to
SEP 058.
This work is published from:
United States.