Skip to content

Latest commit

 

History

History
312 lines (185 loc) · 27.6 KB

sep_020.md

File metadata and controls

312 lines (185 loc) · 27.6 KB

SEP 020 -- Linking experimental results with Biological clones and replicates

SEP
Title SEP 020 -- Linking experimental results with Biological clones and replicates
Authors Raik Gruenberg ([email protected])
Editor Nicholas Roehner ([email protected])
Type Data Model
SBOL Version 2.3
Replaces SEP 014, SEP 016, SEP 017, SEP 019
Status Draft
Created 09-Dec-2017
Last modified 02-Nov-2017
Issue #47

Abstract

Experimental information recorded for one engineered clone of cells or a batch of biomaterial is often not representative for the behavior of another clone or batch. The reason are additional variables (known or unknown) such as mutations or stochastic biochemical events that lie outside the scope of the actual design. Bioengineers are therefore used to tracking the identity of clones, batches or replicates of biological materials whenever experimental results are collected. The Implementation class (previously defined in SEP 19) is here modified in order to allow a reliable and straightforward identification of cell lines, bacterial clones and other replicates of biomaterial. In contrast to SEP 19, the present proposal introduces a mandatory and direct link ('design') from Implementation to its intended design. SEP 20 therefore guarantees that we always know What a clone or batch is. Information about How something was constructed may be added using prov-O (provenance ontology) records just as described in SEP 19.

Table of Contents

1 Rationale

If I am given a sample of something, the most important question is: "What is it?". SEP 19 gives no less than three different ways of answering this question: (1) prov:wasDerivedFrom, (2) built, (3) prov:generatedBy + Activity. This would be confusing enough but, in fact, none of these redundant links is mandatory and it is not clear which one is preferred. Therefore, while SEP 19 makes a very useful first attempt to record workflow information ("How was this made?" or "How was this measured?") the, arguably, more basic information about "What is this?" may be ambiguous, in conflict with other statements or completely missing. This is the main issue that is being addressed by the present proposal.

The main focus of SEP 20 lies therefore on the Implementation class. However, the SBOL editors have demanded a clear choice between SEP 19 and SEP 20 and I am therefore also including workflow-related information in this proposal. These are taken directly from SEP 19 and should be interpreted as preliminary guidelines, to be tested and further refined by practical use.

The main goals of this proposal are therefore:

  • Identify cell lines (clones, replicates, batches) as needed for the exchange of biomaterials (e.g. from AddGene, ACTC, partsregistry) and as also currently required by Journals for experimental papers
  • Allow to attach experimental results to clones/batches
  • Guarantee that each record of a clone or batch always points to exactly one authorative design
  • Workflow & provenance information make up an additional layer of meta information rather than being required
  • Ensure that the link from Experimental data through clone/batch to (intended) design is as direct and unambiguous as possible and does not require iterations over chains of events / activities (with the associated risk of missing links)

2 Specification

2.1 Implementation class

An Implementation represents a real, physical instance of a synthetic biological construct which may be associated with, typically more than one, actual laboratory samples. Note that the representation of individual samples (as in LIMS systems) remains outside the scope of SBOL. Whether or not two samples are considered to belong to the same batch or at which point they are split into separate Implementation instances is decided by the user and will be a matter of community best practices in a given area of bioengineering.

Implementation MUST be linked back to its intended design, either a ModuleDefinition or ComponentDefinition, using a mandatory field design (this is the main difference to SEP 19). An Implementation MAY also link, via an optional field built, to an additional ModuleDefinition or ComponentDefinition which represents a more detailed picture of its actual structure as it was realized in the laboratory.

Implementation class UML diagram

Figure 1: Diagram of the Implementation class and its associated properties

2.1.1 Implementation.design

Each Implementation record MUST have exactly one design property which MUST point to a ComponentDefinition or a ModuleDefinition. This Component/ModuleDefinition is interpreted as the intended design. Sister clones of a bioengineering experiment will therefore always point to the same design ComponentDefinition regardless of whether or not they accurately implement this design.

If, after creation of a clone, additional design information becomes available that equally applies to all clones of an experiment, then the design ComponentDefinition instance may be updated in-place. Alternatively the design field of all affected Implementations may be changed to point to a new version of the Component/ModuleDefinition. In both cases, previous versions and reasons for the change can, optionally, be tracked using established prov-O versioning (e.g. ComponentDefinition.wasDerivedFrom) as already prescribed by SBOL.

2.1.2 Implementation.built

An optional Implementation.built property references one ComponentDefinition that describes the, possibly deviating, actual physical composition of a realized construct. built MUST NOT reference the same ComponentDefinition that design points to. Instead, if no extra or deviating structural information exists for a given clone, the built field is simply not to be used at all. [This is a difference to SEP 19]

PREFERABLY, the built structure should be specific for this particular Implementation. For example, it may be based on the sequencing of this particular clone. Implementation.built SHOULD NOT be used to introduce general corrections to the original design (see 2.1.1 above how this should instead be handled).

2.1.3 Implementation Identity

Once created, the uri identifier of an Implementation instance MUST NOT change. The Implementation record is mutable and may change over time as new information is added or corrected. However, since Implementation represents a physical object in the real world, changes to the information about this physical object MUST NOT lead to a change of its unique identifier.

2.2 Best Practices

The following Best Practices are currently considered guide lines which may change in future SBOL releases.

2.2.1 Attaching Experimental Data

Implementation MAY link to experimental data using the attachment field approved in SEP 18. It is RECOMMENDED that sbol:Attachment instances containing experimental data are further annotated by prov-O activity records as shown in the following Figure.

Experimental Data

Figure 2: Association of experimental data with Implementation (Clones/Batches) and intended Design.

Such Attachments MAY instead also be grouped into Collections. Details will have to be determined by practical use.

2.2.2 Describing DNA assembly

Detailed information about how a given construct or organism was built in the lab can be encoded in a prov-O:Activity record. [Arguably, this should be described in a separate SEP. The detailed description of experiments and protocols is beyond the scope of this SEP]. All sister clones of a construction experiment SHOULD point to the same Activity record. Clones that implement the same design but were produced independently (e.g. in a different lab) SHOULD NOT point to the same Activity record, even if the assembly protocol happened to be identical.

Assembly Method

Figure 3: Representation of construct or organism assembly.

The Assembly Activity MAY also have a Usage:design record pointing to the design ComponentDefinition. This link is redundant with the Implementation.design field and therefore optional [A difference to SEP 19]. If present, it MUST point to the same Component/ModuleDefinition instance as the Implementation.design field.

2.2.3 Link to ancestral clones or batches

prov:wasDerivedFrom references are used to express that one batch or clone was directly created from another one. It is RECOMMENDED that this should be accompanied by a prov:Activity record that outlines how one batch was derived from the other. However, we do not yet have a standard for formulating these Activity records. Examples are: transformation of DNA into bacteria to generate a new cell stock or purification of DNA or proteins from a given cell stock. Also the simple splitting of cell lines or inoculation of a new culture from a given bacterial stock may or may not warrant the definition of a new Implementation record.

Ancestry

Figure 4: Documentation of simple cell line or clone ancestry.

It is RECOMMENDED that Implementation.wasDerivedFrom is exclusively used for experimental strain/batch ancestry relations. This is in contrast to SEP 19, where the same term is also used to link Implementation to its design ComponentDefinition.

Note: This solution may create conflicts or ambiguities with SBOL versioning. Other alternatives are being discussed.

2.2.4 Integration with Prov-O and workflow information

The use of prov-O vocabulary together with Implementation is encouraged but optional. Parsing of prov-O information MUST NOT be needed for (1) the simple identification of a clone, (2) neither for determining what structure (a.k.a. sequence) it is supposed to have. Further details on the use of prov-O for workflow documentation are suggested in the Example section below.

2.2.5 Versioning versus Provenance Semantics

In line with SEP 19 (see detailed discussion in 2.3.2), we RECOMMEND as a best practice that objects linked by Activities not be successive versions (of the computational record) of each other, though we leave this at the discretion of users and library developers.

2.3 Validation Rules

Not much is needed beyond the strict data model definition:

  • Implementation.built SHOULD NOT point to the same ComponentDefinition as Implementation.design (Warning)

  • If a "Built" Activity referenced by Implementation.wasGeneratedBy contains a Usage:design record, then this Usage:design MUST point to the same ComponentDefinition or ModuleDefinition as Implementation.design. (Error)

SEP 19 prescribes a host of additional validation rules related to prov-O usage. Some of these may apply here as well.

3 Examples

3.1 Plasmids and Bacterial clones

The first example covers the, presumably, most common use case of transmitting information about one particular plasmid which may be shipped between labs or obtained from the iGEM registry or AddGene:

Batch of plasmid DNA

Example 1a: This example covers the, presumably, most common use case of transmitting information about one particular plasmid which may be shipped between labs or obtained from the iGEM registry or AddGene. Two scenarios are shown. Top: the by far, most common use case where the plasmid corresponds exactly to the given design. Bottom: a scenario as it may, for example, happen in intermediate stages of assembly processes, where a plasmid is tracked and found to not exactly correspond to its design.

It is instructive to compare the above example to exactly the same information expressed according to SEP 19. Most UML diagrams in SEP 19 are highly simplified. So here is the same example with everything spelled out according to SEP 19 and the current prov-O documentation:

SEP 19 plasmid DNA

SEP 19 Example: The exact same data as in Example 1a shown in the notation suggested by SEP 19. Note that the same information expressed by a single design field in SEP 20, requires, in SEP 19, 6 prov-O fields, 2 prov-O class instances, and one novel sbol:"design" term.

Returning to SEP 20, the above (correct) plasmid may be transformed into an E. coli expression strain for protein production. A cell stock of the single colony picked for the expression culture will be kept for future reference. Let's again assume something went wrong and we later discover that there was a genomic mutation in this particular clone. In other words, the cell stock faithfully implements the core of the design (the expression plasmid) but its genome deviates from the original design (and from its sister clones). It may still be used in subsequent experiments and the deviation is documented using the built link:

Bacterial clone

Example 1b: A clone of a bacterial strain harboring the same expression plasmid but with a novel genomic mutation.

Note: SBOL currently has no clear rule for the identification of organisms. Presumably, a plasmid within a cell will be represented by a ComponentDefinition for the plasmid, contained within a ModuleDefinition somehow representing the cell. The given scheme is thus merely a suggestion.

3.2 Design-Build-Test workflow

Competing SEP 19 emphasizes the documentation of an idealized design - build - test workflow. I am skeptical as to how realistic this is at this point. For example, automated model building or automated sequence design from models is still the exception. In my experience, models are (if at all) built after a design has been made. I therefore find it problematic to connect Experimental Results -> Implementation, Implementation -> Design and Design -> Model with prov:wasDerivedFrom links as suggested in SEP 19. In 95% of use cases, this will be confusing or outright false. Moreover, SBOL already specifies proper fields for connecting Design and Model as well as Attachment.

The following is therefore, according to this proposal, the minimal representation of the actual result of a design - build - test workflow:

Minimal workflow representation

Example 2a: A minimal workflow representation without use of prov-O meta data. This example provides the essential link from experimental result to clone, design and theoretical model.

In-line with SEP 19, rich meta data may be added to annotate the factual workflow result with "activity" records:

Full workflow representation

Example 2b: A tentative full workflow representation. Prov-O activities provide an orthogonal layer of meta information on top of the native SBOL chain of objects. Note: the prov:usage terms "design", "build", "test", "learn" have been suggested in sboldev discussions and are defined in SEP 19. "biomaterial" would be a new term which, in my view, better represents the relation between experimental protocols and clones/batches. The OBI ontology for biomedical investigations contains many terms that may be of more specific use here.

Note: This example has been adapted from Example 2 in SEP 19. The main difference is that wasDerivedFrom links are replaced by design, model and attachment. Other changes to the prov-O layout are suggestions only.

3.3 Assembly Provenance

Please refer to section 2.2.2.

3.4 Pedigree of bacterial clones in iGEM Interlab Study

The following example is taken 1:1 from SEP 19. There is only two modifications: (1) designs are linked via design rather than built., (2) there is no "wasDerivedFrom" link between Implementation and design.

iGEM interlab study

Example 4: An initial plasmid kit based on part designs from the iGEM Registry is replicated and split into different samples. Note, contrary to the use (in blue) in this example, Implementation is not actually supposed to be equaled to "Sample". Implementation defines clones or batches that are typically distributed over many samples and only occasionally will each sample be unique enough to warrant its own Implementation. However, as SBOL currently does not have a term for "Sample", the given scheme is a workaround.

3.5 Co-Transfection of Constructs for CRISPR Repression Module

The following example is taken 1:1 from SEP 19 (Example 6). As before, the data model is somewhat simplified by the fact that there is only one link between Implementation and ComponentDefinition. In other words, design replaces the redundant use of prov:wasDerivedFrom and built.

CRISPR circuit sample

Example 5: Two sets of overlapping constructs that implement the components of a CRISPR-based circuit are co-transfected to implement two different versions of the circuit.

Note: Arguably, the wasDerivedFrom links (shown in gray color) from the CRISPR construct "mixture" or circuits on the right to the individual genes on the left can be removed. The same information is already provided by the ModuleDefinition. Alternatively, the sub-implementations should be linked together through a more expressive gene assembly "prov:Activity" activity (as outlined in section 2.2.2).

4 Backwards Compatibility

This proposal does not affect backwards compatibility.

5 Discussion

5.1 Key differences to SEP 19

SEP 19 relies heavily on prov-O provenance ontology terms which are freely mixed with SBOL terms. SEP 20 instead tries to establish a clear separation of concerns: SBOL terms are used to describe the core information of What we are looking at. Prov-O terms are used to add an optional layer of information about How something was created. In practice, this means the proper SBOL terms model, design, attachment are used to link Experimental Results, Implementation, Component/ModuleDefinition, and Model (the "What").

SEP 19 gives no less than three different ways of answering the "What are we looking at" question: (1) wasDerivedFrom, (2) built, (3) generatedBy + Activity. None of these redundant links is mandatory and it is not clear which one is preferred.

[1] wasDerivedFrom has the serious issue that is also being used for (i) computational version tracking, and (ii) tracking of clonal relationships between samples and, moreover, (iii) it could to be used for pointing to structural building blocks (Implementations) used during construction.

[2] Built seems to be the most obvious and straightforward choice but it is NOT fitting either because it is explicitely reserved for "actual" structure or "actual function". Upon creation of a batch or clone in the lab, the actual physical structure is not yet known (functional behavior may never be known, is a can of worms to even specify, and is irrelevant for the question of what something is). As we MUST NOT create an Implementation record that is not pointing to its content -- after all I am holding a physical sample in my hands and have to say what the heck it is -- this only leaves us with "design".

[3] design according to SEP 19 is formulated using a prov-O record comprising around 8 different classes and fields. The link may furthermore be indirect (going through "parent" Implementations) which means it needs to be inferred by iterating over a potential chain of Implementation instances. This chain may easily become interupted as there is no guarantee or mechanism to ensure that all parent Implementations are in the same SBOL document or even accessible elsewhere.

SEP 20 solves these issues with a direct and mandatory field design. All use of provenance ontology terms and constructs is made optional.

As a side effect, this leads to a massive reduction of prov-O overhead and reference redundancy. In SEP 19, the expression of the design link via a prov-O Activity comprises a massive overhead of, in many cases redundant, fields and classes (8 or 9 interlinked entities, likely translating to 20 to 40 lines of XML code) only so that additional meta information could theoretically be added on top of that. SEP 20 replaces this by a single pointer. This will be sufficient for most use cases. If, and only if, additional meta information is available, prov-O Activities can be layered on top of that.

The high number of redundant references in the SEP 19 prov-O model (see the Figure in Example 1 above) also requires a very high number of new validation rules that verify that two different but equivalent prov-O pointers indeed reference the same Implementation or ComponentDefinition object. Realistically, we have to assume SBOL documents will not always meet all validation rules. SEP 19 therefore creates an abundance of possibilities for serious bugs where software tools receive conflicting references.

5.2 Conflicts with versioning

The use of Implementation.wasDerivedFrom for strain ancestry (section 2.2.3) creates a potential conflict with versioning: wasDerivedFrom is currently also used to link different (computational) versions of SBOL records. (This problem is even more acute in SEP 19 where wasDerivedFrom is heavily used for non-versioning related links.) This issue requires further discussion. One possible alternative to wasDerivedFrom would be the use of generatedBy pointing to an Activity. The Activity could then be annotated with a OBI term describing the exact experiment and relation used to derive one clone from another.

Further alternatives:

  • Define a new "sbol:ancestor" (or similar) term to be associated with a prov:Usage record within the activity.

  • Invent a custom SBOL term for physical ancestry of batches and strains.

5.3 design link from Assembly Activity to ComponentDefinition

As mostly philosophical difference between this proposal and SEP 19 is that SEP 19 considers the Implementation to be "generated" or "derived" from the design. In my view, a given Implementation either implements a design or it does not. I would argue that there should not be any wasDerivedFrom link between Implementation and design. One could easily argue that a given implementation has been primarily generated not from a design but from the physical building blocks that it was assembled from. However, this argument does not rule out custom "usage" links from activity records to ComponentDefinition.

5.4 Implementation.status field

SEP 16 suggested the introduction of an Implementation.status field pointing to one of a fixed set of possible values: "validated_correct", "validated_incorrect", "not_validated", "validated_ambiguous". This field was explicitly only supposed to deal with sequence validation without touching the more complicated issue whether or not a given clone in fact "works" as expected.

The field may indeed be required if we insist that design always points to the intended design and build is only used if there are known deviations from this design. In this case it becomes difficult to distinguish between a clone that has been shown to be exactly as specified in the design and another clone for which this is simply not yet known. Also, clones that are shown to be incorrect (e.g. fail sequencing) could be more easily flagged without always attaching the underlying sequencing information.

On the other hand, this kind of information can only be a (albeit useful) shortcut and almost always would benefit from additional meta information (how was a batch validated, how was the result evaluated and by whom?). For this reason, and in order to minimize differences with SEP 19, it has not been included here.

6 Relation to Other SEPs

This proposal is a merge of SEP 16 and SEP 19 after discussion on the Github issue tracker, and on the ["Design-Build-Test" thread]: https://groups.google.com/forum/#!topic/sbol-dev/AnpwJP2_f5A and during a conference call in 11/2017.

SEP 20 relies on SEP 18 (definition of "Attachment").

SEP 20 is meant to replace SEP 19.

References

Copyright

CC0
To the extent possible under law, SBOL developers has waived all copyright and related or neighboring rights to SEP 020. This work is published from: United States.