diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index c51c25bf..31f83610 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -20,7 +20,7 @@ jobs: - name: Install dependencies run: | python -m pip install --upgrade pip setuptools - pip install -r .requirements.txt + pip install --pre -r .requirements.txt - name: Test with pytest run: | diff --git a/.requirements.txt b/.requirements.txt index 65e0c4fb..c9d4cb68 100644 --- a/.requirements.txt +++ b/.requirements.txt @@ -1,7 +1,8 @@ pytest -python-jsonschema-objects>=0.3,<=0.3.10 +python-jsonschema-objects>=0.4.0 jsonschema==3.2.0 ipython pyyaml -ga4gh.gks.metaschema>=0.1.1 -sphinx ~= 3.5 \ No newline at end of file +ga4gh.gks.metaschema==0.2.0rc4 +sphinx ~= 4.5 +sphinx-rtd-theme ~= 1.2 \ No newline at end of file diff --git a/docs/source/appendices/design_decisions.rst b/docs/source/appendices/design_decisions.rst index c781c046..b42a04d5 100644 --- a/docs/source/appendices/design_decisions.rst +++ b/docs/source/appendices/design_decisions.rst @@ -32,11 +32,11 @@ Allele Rather than Variant The most primitive sequence assertion in VRS is the :ref:`Allele` entity. Colloquially, the words "allele" and "variant" have similar meanings and they are often used interchangeably. However, the VR -contributors believe that it is essential to distinguish the state of -the sequence from the change between states of a sequence. It is +contributors assert that it is essential to distinguish between the *state of* +a reference sequence from the *change from* a reference sequence. It is imperative that precise terms are used when modelling data. Therefore, -within VRS, Allele refers to a state and "variant" refers to the change -from one Allele to another. +within VRS, "allele" refers to a state of a reference sequence and "variant" refers to a change +from a reference sequence. The word "variant", which implies change, makes it awkward to refer to the (unchanged) reference allele. Some systems will use an HGVS-like @@ -45,45 +45,6 @@ when referring to an unchanged residue. In some cases, such "variants" are even associated with allele frequencies. Similarly, a predicted consequence is better associated with an allele than with a variant. -.. _should-normalize: - -Implementations should normalize -@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - -VRS STRONGLY RECOMMENDS that Alleles be :ref:`normalized -` when generating :ref:`computed identifiers -`. The rationale for recommending, rather than -requiring, normalization is grounded in dual views of Allele objects -with distinct interpretations: - -* Allele as minimal representation of a change in sequence. In this - view, normalization is a process that makes the representation - minimal and unambiguous. - -* Allele as an assertion of state. In this view, it is reasonable to - want to assert state that may include (or be composed entirely of) - reference bases, for which the normalization process would alter the - intent. - -Although this rationale applies only to Alleles, it may have have -parallels with other VRS types. In addition, it is desirable for all -VRS types to be treated similarly. - -Furthermore, if normalization were required in order to generate -:ref:`computed-identifiers`, but did not apply to certain instances of -VRS Variation, implementations would likely require secondary -identifier mechanisms, which would undermine the intent of a global -computed identifier. - -The primary downside of not requiring normalization is that Variation -objects might be written in non-canonical forms, thereby creating -unintended degeneracy. - -Therefore, normalization of all VRS Variation classes is optional in -order to support the view of Allele as an assertion of state on a -sequence. - - .. _fully-justified: @@ -113,6 +74,55 @@ occurs in a low-complexity region, but rather describes the final and unambiguous state of the resultant sequence. +.. _should-normalize: + +Implementations should normalize Alleles +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +VRS STRONGLY RECOMMENDS that Alleles be :ref:`normalized +` when generating :ref:`computed identifiers +` unless there is compelling reason to do +otherwise. Those reasons are the subject of this section. + +:ref:`Allele Normalization ` is the process of +comparing a span of reference sequence to a sequence state (often the +alternative sequence) and resolving that span to an unambiguous form. The fully-justified Allele normalization in VRS consists of two steps: trimming +and shuffling. In the trimming step, common flanking prefix and +suffix sequences are removed. For example, a CAG-to-CTG Allele would +be trimmed to merely A-to-T, with the position adjusted accordingly. +There are four cases of the resulting sequences: + + 1. The trimmed sequences are empty: The Allele refers to reference + state. + 2. The trimmed sequences are non-empty: The Allele is a substitution + (perhaps multi-residue). + 3. The reference sequence is empty: The Allele is a net insertion. + 4. The state sequence is empty: The Allele is a net deletion. + +When the Allele refers to a reference state (case 1), trimming would +reduce the variant to a null change. However, reduction to a null +state would make it impossible to refer to a specific span of +reference sequence. In order to permit users to refer to spans of +reference sequence, VRS does not require normalizing reference +agreement Alleles. + +The trimming step applies only when the reference or the state +sequences are empty (cases 3 and 4). When these occur in the context +of repeating reference sequence that matches the inserted or deleted +sequence, the Allele may be shuffled left and right to identify the +fully-justified location of the variation. (See :ref:`normalization` +for details.) + +In rare cases, data originators might have reason to associate an +annotation with a specific repeating unit in the context of repeated +sequence. In order to support this case, normalization is not +strictly required. + +Most users will normalize most Alleles. Normalization should be +skipped only when doing so would decrease the intended precision of an +Allele. + + .. _inter-residue-coordinates-design: Inter-residue Coordinates diff --git a/docs/source/appendices/future_plans.rst b/docs/source/appendices/future_plans.rst index d73d361c..bf81b2ea 100644 --- a/docs/source/appendices/future_plans.rst +++ b/docs/source/appendices/future_plans.rst @@ -96,129 +96,6 @@ Under consideration. See https://github.com/ga4gh/vrs/issues/28. t(9;22)(q34;q11) in BCR-ABL -.. _genotype: - -Genotype -######## - -The genetic state of an organism, whether complete (defined over the -whole genome) or incomplete (defined over a subset of the genome). - -**Computational definition** - -A list of Haplotypes. - -**Information model** - -.. list-table:: - :class: reece-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Type - - Limits - - Description - * - _id - - :ref:`CURIE` - - 0..1 - - Variation Id; MUST be unique within document - * - type - - string - - 1..1 - - Variation type; MUST be set to '**Genotype**' - * - completeness - - enum - - 1..1 - - Declaration of completeness of the Haplotype definition. - Values are: - - * UNKNOWN: Other Haplotypes may exist. - * PARTIAL: Other Haplotypes exist but are unspecified. - * COMPLETE: The Genotype declares a complete set of Haplotypes. - - * - members - - :ref:`Haplotype`\[] or :ref:`CURIE`\[] - - 0..* - - List of Haplotypes or Haplotype identifiers; length MUST agree - with ploidy of genomic region - - -**Implementation guidance** - -* Haplotypes in a Genotype MAY occur at different locations or on - different reference sequences. For example, an individual may have - haplotypes on two population-specific references. -* Haplotypes in a Genotype MAY contain differing numbers of Alleles or - Alleles at different Locations. - -**Notes** - -* The term "genotype" has two, related definitions in common use. The - narrower definition is a set of alleles observed at a single - location and with a ploidy of two, such as a pair of single residue - variants on an autosome. The broader, generalized definition is a - set of alleles at multiple locations and/or with ploidy other than - two.The VRS Genotype entity is based on this broader definition. -* The term "diplotype" is often used to refer to two haplotypes. The - VRS Genotype entity subsumes the conventional definition of - diplotype. Therefore, the VRS model does not include an explicit - entity for diplotypes. See :ref:`this note - ` for a - discussion. -* The VRS model makes no assumptions about ploidy of an organism or - individual. The number of Haplotypes in a Genotype is the observed - ploidy of the individual. -* In diploid organisms, there are typically two instances of each - autosomal chromosome, and therefore two instances of sequence at a - particular location. Thus, Genotypes will often list two - Haplotypes. In the case of haploid chromosomes or - haploinsufficiency, the Genotype consists of a single Haplotype. -* A consequence of the computational definition is that Haplotypes at - overlapping or adjacent intervals MUST NOT be included in the same - Genotype. However, two or more Alleles MAY always be rewritten as an - equivalent Allele with a common sequence and interval context. -* The rationale for permitting Genotypes with Haplotypes defined on - different reference sequences is to enable the accurate - representation of segments of DNA with the most appropriate - population-specific reference sequence. - -**Sources** - -SO: `Genotype (SO:0001027) -`__ -— A genotype is a variant genome, complete or incomplete. - -.. _genotypes-represent-haplotypes-with-arbitrary-ploidy: - -.. note:: Genotypes represent Haplotypes with arbitrary ploidy - The VRS defines Haplotypes as a list of Alleles, and Genotypes as - a list of Haplotypes. In essence, Haplotypes and Genotypes represent - two distinct dimensions of containment: Haplotypes represent the "in - phase" relationship of Alleles while Genotypes represents sets of - Haplotypes of arbitrary ploidy. - - There are two important consequences of these definitions: There is no - single-location Genotype. Users of SNP data will be familiar with - representations like rs7412 C/C, which indicates the diploid state at - a position. In the VRS, this is merely a special case of a - Genotype with two Haplotypes, each of which is defined with only one - Allele (the same Allele in this case). The VRS does not define a - diplotype type. A diplotype is a special case of a VRS Genotype - with exactly two Haplotypes. In practice, software data types that - assume a ploidy of 2 make it very difficult to represent haploid - states, copy number loss, and copy number gain, all of which occur - when representing human data. In addition, assuming ploidy=2 makes - software incompatible with organisms with other ploidy. The VRS - makes no assumptions about "normal" ploidy. - - In other words, the VRS does not represent single-position - Genotypes or diplotypes because both concepts are subsumed by the - Allele, Haplotype, and Genotypes entities. - - - .. _GitHub issue: https://github.com/ga4gh/vrs/issues .. _genetic variation: https://en.wikipedia.org/wiki/Genetic_variation diff --git a/docs/source/impl-guide/computed_identifiers.rst b/docs/source/impl-guide/computed_identifiers.rst index 2fb8662b..59897461 100644 --- a/docs/source/impl-guide/computed_identifiers.rst +++ b/docs/source/impl-guide/computed_identifiers.rst @@ -119,9 +119,7 @@ If the object is an instance of a VRS class, implementations MUST: * ensure that objects are referenced with identifiers in the ``ga4gh`` namespace * replace each nested :term:`identifiable object` with their - corresponding *digests*. (Note: Attributes of some objects, such - as :ref:`CopyNumber`, permit a mix of identifiable and - non-identifiable values.) + corresponding *digests*. * order arrays of digests and ids by Unicode Character Set values * filter out fields that start with underscore (e.g., `_id`) * filter out fields with null values @@ -193,7 +191,7 @@ Truncated Digest (sha512t24u) The sha512t24u truncated digest algorithm [Hart2020]_ computes an ASCII digest from binary data. The method uses two well-established standard algorithms, the `SHA-512`_ hash function, which generates a binary -digest from binary data, and `Base64`_ URL encoding, which encodes +digest from binary data, and a URL-safe variant of `Base64`_ encoding, which encodes binary data using printable characters. Computing the sha512t24u truncated digest for binary data consists of diff --git a/docs/source/releases/1.3.rst b/docs/source/releases/1.3.rst index b8d59f3a..61d2bf66 100644 --- a/docs/source/releases/1.3.rst +++ b/docs/source/releases/1.3.rst @@ -15,15 +15,15 @@ Major Changes ############# * :ref:`CopyNumberChange` introduced for relative copy number calls - * :ref:`CopyNumberCount` replaces `CopyNumber` - * :ref:`Genotype` introduced for describing genotypes - * :ref:`ComposedSequenceExpression` introduced for composing expressions - from multiple other sequence expressions + * :ref:`CopyNumberCount` replaces `CopyNumber (v1.2) `_ + * :ref:`Genotype` introduced as a new systemic variation concept + * :ref:`ComposedSequenceExpression` introduced for composing expressions from multiple other sequence expressions Minor Changes ############# - * Clarifying updates for :ref:`Allele normalization guidance <>` + * Clarifying updates for :ref:`Allele normalization guidance + ` * :ref:`Haplotype` allele member minimum was revised from 1 to 2 * Updated metaschema processor version * Introduced ordered / unordered attribute in array declarations diff --git a/docs/source/releases/index.rst b/docs/source/releases/index.rst index 194bfcdc..0f0af271 100644 --- a/docs/source/releases/index.rst +++ b/docs/source/releases/index.rst @@ -23,6 +23,7 @@ Releases :maxdepth: 2 :includehidden: + 1.3.rst 1.2.rst 1.1.rst 1.0.rst diff --git a/docs/source/terms_and_model.rst b/docs/source/terms_and_model.rst index 0e51b09d..653142d8 100644 --- a/docs/source/terms_and_model.rst +++ b/docs/source/terms_and_model.rst @@ -267,11 +267,8 @@ genetic markers that tend to be transmitted together. * The locations of Alleles within the Haplotype MUST be interpreted independently. Alleles that create a net insertion or deletion of sequence MUST NOT change the location of "downstream" Alleles. -* The `members` attribute is required and MUST contain at least one - Allele. -* Haplotypes with one Allele are intended to be distinct entities from - the Allele by itself. See discussion on :ref:`equivalence`. - +* The `members` attribute is required and MUST contain at least two + Alleles. **Sources** @@ -372,13 +369,13 @@ Systemic Variation .. include:: defs/SystemicVariation.rst .. _CopyNumber: +.. _CopyNumberCount: -CopyNumber -$$$$$$$$$$ +CopyNumberCount +$$$$$$$$$$$$$$$ -*Copy Number Variation* captures the copies of a molecule within a -genome, and can be used to express concepts such as amplification -and copy loss. Copy Number Variation has conflated meanings in the +*Copy Number Count* captures the integral copies of a molecule within a +genome. Copy Number Count has conflated meanings in the genomics community, and can mean either (or both) the notion of copy number *in a genome* or copy number *on a molecule*. VRS separates the concerns of these two types of statements; this concept is a type @@ -386,7 +383,7 @@ of :ref:`SystemicVariation` and so describes the number of copies in a genome. The related :ref:`MolecularVariation` concept can be expressed as an :ref:`Allele` with a :ref:`RepeatedSequenceExpression`. -.. include:: defs/CopyNumber.rst +.. include:: defs/CopyNumberCount.rst **Examples** @@ -404,9 +401,123 @@ Two, three, or four total copies of BRCA1: "gene_id": "ncbigene:348", "type": "Gene" }, - "type": "CopyNumber" + "type": "CopyNumberCount" } +.. _CopyNumberChange: + +CopyNumberChange +$$$$$$$$$$$$$$$$ + +*Copy Number Change* captures a categorization of copies +of a molecule within a system, relative to a baseline. These types +of Variation are common outputs from CNV callers, particularly in the +somatic domain where integral :ref:`CopyNumberCount` are difficult to +estimate and less useful in practice than relative statements. Somatic CNV +callers typically express changes as relative statements, and many HGVS +expressions submitted to express copy number variation are interpreted to be +relative copy changes. + +.. include:: defs/CopyNumberChange.rst + +**Examples** + +Low-level copy gain of BRCA1: + +.. parsed-literal:: + + { + "copy_change": "efo:0030071", # low-level gain + "subject": { + "gene_id": "ncbigene:348", # BRCA1 gene + "type": "Gene" + }, + "type": "CopyNumberChange" + } + +.. _genotype: + +Genotype +$$$$$$$$ + +A *genotype* is a representation of the variants present at a given genomic locus, and may be referred +to either by individual nucleotide representations (e.g. GT representation in VCF files) or symbolically +(e.g. A/B/O blood type reporting). To support these use cases, VRS genotypes enable representation of +genotypes using either :ref:`Allele` objects (as commonly done in VCF records) or larger :ref:`Haplotype` +objects (which would otherwise be represented using symbolic shorthand). + +.. include:: defs/Genotype.rst + +**Implementation guidance** + +* Haplotypes or Alleles in :ref:`GenotypeMember` objects MAY occur at different locations or on + different reference sequences. For example, an individual may have haplotypes on two + population-specific references. + +**Notes** + +* The term "genotype" has two, related definitions in common use. The + narrower definition is a set of alleles observed at a single + location and often with a ploidy of two, such as a pair of single residue + variants on an autosome. The broader, generalized definition is a + set of alleles at multiple locations and/or with ploidy other than + two. VRS Genotype entity is based on this broader definition. +* The term "diplotype" is often used to refer to two in-trans haplotypes at a locus. + VRS Genotype entity subsumes the conventional definition of diplotype, though + it describes no explicit in-trans phase relationship. Therefore, + VRS does not include an explicit entity for diplotypes. See :ref:`this note + ` for a discussion. +* VRS makes no assumptions about ploidy of an organism or individual nor any + polysomy affecting a locus. The `genotype.count` attribute explicitly captures the total + count of molecules associated with a genomic locus represented by the Genotype. +* In diploid organisms, there are typically two instances of each autosomal chromosome, + and therefore two instances of sequence at a particular locus. Thus, Genotypes will + often list two GenotypeMembers each based on a distinct Haplotype or Allele. In the case + of haploid chromosomes or haploinsufficiency, the Genotype consists of a single GenotypeMember. +* A specific (heterozygous) diplotype SHOULD be represented as a Genotype of two GenotypeMember + instances each containing a constituent :ref:`Haplotype`. A homozygous diplotype SHOULD be + represented as a Genotype of one constituent GenotypeMember (with `GenotypeMember.count=2`). +* A consequence of the computational definition is that in-cis Haplotypes at overlapping or + adjacent intervals MUST be merged into a single Haplotype for the same Genotype. +* A `GenotypeMember.variation` value MUST be unique among Genotype Members within a Genotype. + When more than one Genotype Member would have the same `variation` value (e.g. in the case + of a homozygous variant), this would be represented as a Genotype Value with a corresponding + `count` (i.e. for a diploid homozygous variant, `GenotypeMember.count = 2`). +* The rationale for permitting Genotypes with Haplotypes defined on different reference + sequences is to enable the accurate representation of segments of DNA with the most + appropriate population-specific reference sequence. +* Deletion of sequence at locus would be represented by the presence of Alleles of deleted + sequence, not absence of Alleles; therefore Genotypes MAY NOT have count < 1. + +**Sources** + +SO: `Genotype (SO:0001027) +`__ +— A genotype is a variant genome, complete or incomplete. + +.. _genotypes-represent-haplotypes-with-arbitrary-ploidy: + +.. note:: + VRS defines Genotypes using a list of GenotypeMembers defined by + Haplotypes or Alleles. In essence, Haplotypes and Genotypes represent + two distinct dimensions of containment: Haplotypes represent the "in + phase" relationship of Alleles while Genotypes represents sets of + Haplotypes of arbitrary ploidy. + + There are two important consequences of these definitions: There is no + single-location Genotype. Users of SNP data will be familiar with + representations like rs7412 C/C, which indicates the diploid state at + a position. In VRS, this is merely a special case of a + Genotype with one GenotypeMember, defined by a single Allele with + two copies. VRS does not define a diplotype class. A diplotype + is a special case of a VRS Genotype with count = 2. In practice, software + data types that assume a ploidy of 2 make it very difficult to represent haploid + states, copy number loss, and copy number gain, all of which occur + when representing human data. In addition, inferred ploidy = 2 makes + software incompatible with organisms with other ploidy. VRS + requires explicit definition of the count of molecules associated with + a genomic locus using the `count` attribute, though this count may be inexact + (e.g. a :ref:`DefiniteRange` or :ref:`IndefiniteRange`). .. _UtilityVariation: @@ -919,6 +1030,55 @@ large-scale tandem duplications. "type": "RepeatedSequenceExpression" } +.. _ComposedSequenceExpression: + +ComposedSequenceExpression +########################## + +*Composed Sequence* is a class of sequence expression composed of other sequence expression +types. It is useful, for example, when representing multiple repeating subunits that occur +in tandem, such as in the description of *PABPN1* alleles in the diagnosis of +oculopharyngeal muscular dystrophy (OPMD). + +.. include:: defs/ComposedSequenceExpression.rst + +**Examples** + +.. parsed-literal:: + + { + "type": "Allele", + "location": { + "type": "SequenceLocation", + "sequence_id": "ga4gh:SQ.sH4gymNtL5nxNdTE3evfxzZa4dg3fqDz", + "interval": { + "type": "SequenceInterval", + "start": { "type": "Number", "value": 3 }, + "end": { "type": "Number", "value": 33 } + } + }, + "state": { + "type": "ComposedSequenceExpression", + "components": [ + { + "type": "RepeatedSequenceExpression", + "seq_expr": { "type": "LiteralSequenceExpression", "sequence": "GCG" }, + "count": { "type": "Number", "value": 11 } + }, + { + "type": "RepeatedSequenceExpression", + "seq_expr": { "type": "LiteralSequenceExpression", "sequence": "GCA" }, + "count": { "type": "Number", "value": 3 } + }, + { + "type": "RepeatedSequenceExpression", + "seq_expr": { "type": "LiteralSequenceExpression", "sequence": "GCG" }, + "count": { "type": "Number", "value": 1 } + } + ] + } + } + .. _Feature: Feature @@ -1045,6 +1205,13 @@ This value is equivalent to the concept of "equal to or greater than "value": 22 } +.. _genotypemember: + +GenotypeMember +############## + +.. include:: defs/GenotypeMember.rst + Primitives @@@@@@@@@@ @@ -1154,7 +1321,7 @@ derived from the IUPAC one-letter nucleic acid and amino acid codes. to define an :ref:`Allele`. A Sequence that replaces another Sequence is called a "replacement sequence". * In some contexts outside VRS, "reference sequence" may refer - to a member of set of sequences that comprise a genome assembly. In the VRS + to a member of set of sequences that comprise a genome assembly. In VRS specification, any sequence may be a "reference sequence", including those in a genome assembly. * For the purposes of representing sequence variation, it is not diff --git a/schema/Makefile b/schema/Makefile index 8a0c601c..e163066b 100644 --- a/schema/Makefile +++ b/schema/Makefile @@ -6,13 +6,13 @@ JSYAMLS:=vrs.yaml JSONS:=${JSYAMLS:.yaml=.json} -all: vrs.json defs +all: ${JSONS} defs -vrs.json: vrs.yaml +%.json: %.yaml jsy2js.py <$< >$@ -vrs.yaml: vrs-source.yaml - source2jsy.py <$< >$@ +%.yaml: %-source.yaml + source2jsy.py $< >$@ defs: rm -rf defs diff --git a/schema/defs/vrs/ComposedSequenceExpression.rst b/schema/defs/vrs/ComposedSequenceExpression.rst new file mode 100644 index 00000000..09eade67 --- /dev/null +++ b/schema/defs/vrs/ComposedSequenceExpression.rst @@ -0,0 +1,24 @@ +**Computational Definition** + +An expression of a sequence composed from multiple other :ref:`Sequence Expressions` objects. MUST have at least one component that is not a ref:`LiteralSequenceExpression`. CANNOT be composed from nested composed sequence expressions. + +**Information Model** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - type + - string + - 0..1 + - MUST be "ComposedSequenceExpression" + * - components + - :ref:`LiteralSequenceExpression` | :ref:`RepeatedSequenceExpression` | :ref:`DerivedSequenceExpression` + - 2..m + - An ordered list of :ref:`SequenceExpression` components comprising the expression. diff --git a/schema/defs/vrs/CopyNumberChange.rst b/schema/defs/vrs/CopyNumberChange.rst new file mode 100644 index 00000000..64adb97a --- /dev/null +++ b/schema/defs/vrs/CopyNumberChange.rst @@ -0,0 +1,34 @@ +**Computational Definition** + +An assessment of the copy number of a :ref:`Location` or a :ref:`Feature` within a system (e.g. genome, cell, etc.) relative to a baseline ploidy. + +**Information Model** + +Some CopyNumberChange attributes are inherited from :ref:`Variation`. + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - _id + - :ref:`CURIE` + - 0..1 + - Variation Id. MUST be unique within document. + * - type + - string + - 1..1 + - MUST be "CopyNumberChange" + * - subject + - :ref:`Location` | :ref:`CURIE` | :ref:`Feature` + - 1..1 + - A location for which the number of systemic copies is described. + * - copy_change + - string + - 1..1 + - MUST be one of "efo:0030069" (complete genomic loss), "efo:0020073" (high-level loss), "efo:0030068" (low-level loss), "efo:0030067" (loss), "efo:0030064" (regional base ploidy), "efo:0030070" (gain), "efo:0030071" (low-level gain), "efo:0030072" (high-level gain). diff --git a/schema/defs/vrs/CopyNumber.rst b/schema/defs/vrs/CopyNumberCount.rst similarity index 56% rename from schema/defs/vrs/CopyNumber.rst rename to schema/defs/vrs/CopyNumberCount.rst index 17a483cb..ea8c6f1f 100644 --- a/schema/defs/vrs/CopyNumber.rst +++ b/schema/defs/vrs/CopyNumberCount.rst @@ -1,10 +1,10 @@ **Computational Definition** -The absolute count of discrete copies of a :ref:`MolecularVariation`, :ref:`Feature`, :ref:`SequenceExpression`, or a :ref:`CURIE` reference within a system (e.g. genome, cell, etc.). +The absolute count of discrete copies of a :ref:`Location` or :ref:`Feature`, within a system (e.g. genome, cell, etc.). **Information Model** -Some CopyNumber attributes are inherited from :ref:`Variation`. +Some CopyNumberCount attributes are inherited from :ref:`Variation`. .. list-table:: :class: clean-wrap @@ -23,11 +23,11 @@ Some CopyNumber attributes are inherited from :ref:`Variation`. * - type - string - 1..1 - - MUST be "CopyNumber" + - MUST be "CopyNumberCount" * - subject - - :ref:`MolecularVariation` | :ref:`Feature` | :ref:`SequenceExpression` | :ref:`CURIE` + - :ref:`Location` | :ref:`CURIE` | :ref:`Feature` - 1..1 - - Subject of the Copy Number object + - A location for which the number of systemic copies is described. * - copies - :ref:`Number` | :ref:`IndefiniteRange` | :ref:`DefiniteRange` - 1..1 diff --git a/schema/defs/vrs/CytobandInterval.rst b/schema/defs/vrs/CytobandInterval.rst index 1cd1a7d1..460de65e 100644 --- a/schema/defs/vrs/CytobandInterval.rst +++ b/schema/defs/vrs/CytobandInterval.rst @@ -25,4 +25,4 @@ A contiguous span on a chromosome defined by cytoband features. The span include * - end - :ref:`HumanCytoband` - 1..1 - - The start cytoband region. MUST specify a region nearer the terminal end (telomere) of the chromosome q-arm than `start`. + - The end cytoband region. MUST specify a region nearer the terminal end (telomere) of the chromosome q-arm than `start`. diff --git a/schema/defs/vrs/Genotype.rst b/schema/defs/vrs/Genotype.rst new file mode 100644 index 00000000..d599a862 --- /dev/null +++ b/schema/defs/vrs/Genotype.rst @@ -0,0 +1,34 @@ +**Computational Definition** + +A quantified set of :ref:`MolecularVariation` associated with a genomic locus. + +**Information Model** + +Some Genotype attributes are inherited from :ref:`Variation`. + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - _id + - :ref:`CURIE` + - 0..1 + - Variation Id. MUST be unique within document. + * - type + - string + - 1..1 + - MUST be "Genotype" + * - members + - :ref:`GenotypeMember` + - 1..m + - Each GenotypeMember in `members` describes a :ref:`MolecularVariation` and the count of that variation at the locus. + * - count + - :ref:`Number` | :ref:`IndefiniteRange` | :ref:`DefiniteRange` + - 1..1 + - The total number of copies of all :ref:`MolecularVariation` at this locus, MUST be greater than or equal to the sum of :ref:`GenotypeMember` copy counts and MUST be greater than or equal to 1. If greater than the total of GenotypeMember counts, this field describes additional :ref:`MolecularVariation` that exist but are not explicitly described. diff --git a/schema/defs/vrs/GenotypeMember.rst b/schema/defs/vrs/GenotypeMember.rst new file mode 100644 index 00000000..39775064 --- /dev/null +++ b/schema/defs/vrs/GenotypeMember.rst @@ -0,0 +1,28 @@ +**Computational Definition** + +A class for expressing the count of a specific :ref:`MolecularVariation` present *in-trans* at a genomic locus represented by a :ref:`Genotype`. + +**Information Model** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - type + - string + - 1..1 + - MUST be "GenotypeMember". + * - count + - :ref:`Number` | :ref:`IndefiniteRange` | :ref:`DefiniteRange` + - 1..1 + - The number of copies of the `variation` at a :ref:`Genotype` locus. + * - variation + - :ref:`Allele` | :ref:`Haplotype` + - 1..1 + - A :ref:`MolecularVariation` at a :ref:`Genotype` locus. diff --git a/schema/defs/vrs/Haplotype.rst b/schema/defs/vrs/Haplotype.rst index c36e22f4..6202690a 100644 --- a/schema/defs/vrs/Haplotype.rst +++ b/schema/defs/vrs/Haplotype.rst @@ -26,5 +26,5 @@ Some Haplotype attributes are inherited from :ref:`Variation`. - MUST be "Haplotype" * - members - :ref:`Allele` | :ref:`CURIE` - - 1..m + - 2..m - List of Alleles, or references to Alleles, that comprise this Haplotype. diff --git a/schema/ga4gh.yaml b/schema/ga4gh.yaml index 054d8d24..7cb574b8 100644 --- a/schema/ga4gh.yaml +++ b/schema/ga4gh.yaml @@ -23,12 +23,11 @@ identifiers: Allele: VA VariationSet: VS Text: VT - # Genotype: VG + Genotype: GT Haplotype: VH - CopyNumber: VCN - + CopyNumberCount: CN + CopyNumberChange: CX SequenceLocation: VSL ChromosomeLocation: VCL - regexp: '^ga4gh:(?P[^.]+)\.(?P.+)$' diff --git a/schema/vrs-source.yaml b/schema/vrs-source.yaml index a090dc4e..1841fdfa 100644 --- a/schema/vrs-source.yaml +++ b/schema/vrs-source.yaml @@ -11,6 +11,7 @@ $schema: "http://json-schema.org/draft-07/schema" title: "GA4GH-VRS-Definitions" type: object +strict: true definitions: # VRS definitions are presented top-down. Everything rolls up to @@ -43,6 +44,7 @@ definitions: propertyName: type MolecularVariation: + inherits: Variation description: >- A :ref:`variation` on a contiguous molecule. oneOf: @@ -52,6 +54,7 @@ definitions: propertyName: type UtilityVariation: + inherits: Variation description: >- A collection of :ref:`Variation` subclasses that cannot be constrained to a specific class of biological variation, but @@ -63,11 +66,14 @@ definitions: propertyName: type SystemicVariation: + inherits: Variation description: >- A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes. oneOf: - - $ref: "#/definitions/CopyNumber" + - $ref: "#/definitions/CopyNumberCount" + - $ref: "#/definitions/CopyNumberChange" + - $ref: "#/definitions/Genotype" discriminator: propertyName: type @@ -80,9 +86,9 @@ definitions: # Molecular Variation Allele: + inherits: MolecularVariation description: >- The state of a molecule at a :ref:`Location`. - additionalProperties: false type: object properties: type: @@ -108,9 +114,9 @@ definitions: required: [ "location", "state" ] Haplotype: + inherits: MolecularVariation description: >- A set of non-overlapping :ref:`Allele` members that co-occur on the same molecule. - additionalProperties: false type: "object" properties: type: @@ -121,8 +127,9 @@ definitions: MUST be "Haplotype" members: type: array - minItems: 1 + minItems: 2 uniqueItems: true + ordered: false items: oneOf: - $ref: "#/definitions/Allele" @@ -136,9 +143,9 @@ definitions: # UtilityVariation Text: + inherits: UtilityVariation description: >- A free-text definition of variation. - additionalProperties: false type: object properties: type: @@ -154,10 +161,10 @@ definitions: required: [ "definition" ] VariationSet: + inherits: UtilityVariation description: >- An unconstrained set of Variation members. type: object - additionalProperties: false properties: type: type: string @@ -167,6 +174,7 @@ definitions: members: type: array uniqueItems: true + ordered: false items: oneOf: - $ref: "#/definitions/CURIE" @@ -180,28 +188,26 @@ definitions: # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # SystemicVariation - CopyNumber: - additionalProperties: false + CopyNumberCount: + inherits: SystemicVariation type: object description: >- - The absolute count of discrete copies of a :ref:`MolecularVariation`, - :ref:`Feature`, :ref:`SequenceExpression`, or a :ref:`CURIE` reference + The absolute count of discrete copies of a :ref:`Location` or :ref:`Feature`, within a system (e.g. genome, cell, etc.). properties: type: type: string - const: "CopyNumber" - default: "CopyNumber" + const: "CopyNumberCount" + default: "CopyNumberCount" description: >- - MUST be "CopyNumber" + MUST be "CopyNumberCount" subject: oneOf: - - $ref: "#/definitions/MolecularVariation" - - $ref: "#/definitions/Feature" - - $ref: "#/definitions/SequenceExpression" + - $ref: "#/definitions/Location" - $ref: "#/definitions/CURIE" + - $ref: "#/definitions/Feature" description: >- - Subject of the Copy Number object + A location for which the number of systemic copies is described. copies: oneOf: - $ref: "#/definitions/Number" @@ -209,42 +215,74 @@ definitions: - $ref: "#/definitions/DefiniteRange" description: >- The integral number of copies of the subject in a system - allOf: - - if: - properties: - copies: - $ref: "#/definitions/Number" - then: - properties: - copies: - properties: - value: - minimum: 0 - - if: - properties: - copies: - $ref: "#/definitions/IndefiniteRange" - then: - properties: - copies: - properties: - value: - minimum: 0 - - if: - properties: - copies: - $ref: "#/definitions/DefiniteRange" - then: - properties: - copies: - properties: - min: - minimum: 0 - max: - minimum: 0 - required: [ "subject", "copies" ] + CopyNumberChange: + inherits: SystemicVariation + type: object + maturity: draft + description: >- + An assessment of the copy number of a :ref:`Location` or a :ref:`Feature` within a system (e.g. genome, cell, + etc.) relative to a baseline ploidy. + properties: + type: + type: string + const: "CopyNumberChange" + default: "CopyNumberChange" + description: >- + MUST be "CopyNumberChange" + subject: + oneOf: + - $ref: "#/definitions/Location" + - $ref: "#/definitions/CURIE" + - $ref: "#/definitions/Feature" + description: >- + A location for which the number of systemic copies is described. + copy_change: + type: string + enum: [ "efo:0030069", "efo:0020073", "efo:0030068", "efo:0030067", "efo:0030064", "efo:0030070", "efo:0030071", "efo:0030072" ] + description: >- + MUST be one of "efo:0030069" (complete genomic loss), "efo:0020073" (high-level loss), + "efo:0030068" (low-level loss), "efo:0030067" (loss), "efo:0030064" (regional base ploidy), + "efo:0030070" (gain), "efo:0030071" (low-level gain), "efo:0030072" (high-level gain). + required: [ "subject", "copy_change" ] + + Genotype: + inherits: SystemicVariation + description: >- + A quantified set of :ref:`MolecularVariation` associated with a genomic locus. + type: object + properties: + type: + type: string + const: "Genotype" + default: "Genotype" + description: >- + MUST be "Genotype" + members: + type: array + uniqueItems: true + minItems: 1 + ordered: false + items: + $ref: "#/definitions/GenotypeMember" + description: >- + Each GenotypeMember in `members` describes a :ref:`MolecularVariation` + and the count of that variation at the locus. + count: + oneOf: + - $ref: "#/definitions/Number" + - $ref: "#/definitions/IndefiniteRange" + - $ref: "#/definitions/DefiniteRange" + description: >- + The total number of copies of all :ref:`MolecularVariation` at this locus, + MUST be greater than or equal to the sum of :ref:`GenotypeMember` copy counts + and MUST be greater than or equal to 1. + If greater than the total of GenotypeMember counts, this field describes + additional :ref:`MolecularVariation` that exist but are not + explicitly described. + required: [ "members", "count" ] + # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Locations @@ -268,7 +306,7 @@ definitions: propertyName: type ChromosomeLocation: - additionalProperties: false + inherits: Location description: >- A Location on a chromosome defined by a species and chromosome name. type: object @@ -297,7 +335,7 @@ definitions: required: [ "species_id", "chr", "interval" ] SequenceLocation: - additionalProperties: false + inherits: Location description: >- A :ref:`Location` defined by an interval on a referenced :ref:`Sequence`. type: object @@ -335,7 +373,6 @@ definitions: always represented by contiguous spans using interbase coordinates or coordinate ranges. type: object - additionalProperties: false properties: type: type: string @@ -360,71 +397,6 @@ definitions: The end coordinate or range of the interval. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range greater than the value of `start`. - allOf: - - if: - properties: - start: - $ref: "#/definitions/Number" - then: - properties: - start: - properties: - value: - minimum: 0 - - if: - properties: - start: - $ref: "#/definitions/IndefiniteRange" - then: - properties: - start: - properties: - value: - minimum: 0 - - if: - properties: - start: - $ref: "#/definitions/DefiniteRange" - then: - properties: - start: - properties: - min: - minimum: 0 - max: - minimum: 0 - - if: - properties: - end: - $ref: "#/definitions/Number" - then: - properties: - end: - properties: - value: - minimum: 0 - - if: - properties: - end: - $ref: "#/definitions/IndefiniteRange" - then: - properties: - end: - properties: - value: - minimum: 0 - - if: - properties: - end: - $ref: "#/definitions/DefiniteRange" - then: - properties: - end: - properties: - min: - minimum: 0 - max: - minimum: 0 required: [ "type", "start", "end" ] # SimpleInterval has been moved to DEPRECATED section at bottom. @@ -435,7 +407,6 @@ definitions: The span includes the constituent regions described by the start and end cytobands, as well as any intervening regions. type: object - additionalProperties: false properties: type: type: string @@ -450,7 +421,7 @@ definitions: end: $ref: "#/definitions/HumanCytoband" description: >- - The start cytoband region. MUST specify a region nearer the + The end cytoband region. MUST specify a region nearer the terminal end (telomere) of the chromosome q-arm than `start`. example: type: CytobandInterval @@ -469,6 +440,7 @@ definitions: - $ref: "#/definitions/LiteralSequenceExpression" - $ref: "#/definitions/DerivedSequenceExpression" - $ref: "#/definitions/RepeatedSequenceExpression" + - $ref: "#/definitions/ComposedSequenceExpression" discriminator: propertyName: type heritable_properties: @@ -479,10 +451,10 @@ definitions: heritable_required: ["type"] LiteralSequenceExpression: + inherits: SequenceExpression description: >- An explicit expression of a Sequence. type: object - additionalProperties: false properties: type: type: string @@ -495,6 +467,7 @@ definitions: required: [ "sequence" ] DerivedSequenceExpression: + inherits: SequenceExpression description: >- An approximate expression of a sequence that is derived from a referenced sequence location. Use of this class @@ -503,7 +476,6 @@ definitions: large regions in contexts where the use of an approximate sequence is inconsequential. type: object - additionalProperties: false properties: type: type: string @@ -522,9 +494,9 @@ definitions: required: [ "location", "reverse_complement" ] RepeatedSequenceExpression: + inherits: SequenceExpression description: >- An expression of a sequence comprised of a tandem repeating subsequence. - additionalProperties: false type: object properties: type: @@ -545,41 +517,71 @@ definitions: - $ref: "#/definitions/DefiniteRange" description: >- The count of repeated units, as an integer or inclusive range - allOf: - - if: - properties: - count: - $ref: "#/definitions/Number" - then: - properties: - count: - properties: - value: - minimum: 0 - - if: - properties: - count: - $ref: "#/definitions/IndefiniteRange" - then: - properties: - count: - properties: - value: - minimum: 0 - - if: - properties: - count: - $ref: "#/definitions/DefiniteRange" - then: - properties: - count: - properties: - min: - minimum: 0 - max: - minimum: 0 required: [ "seq_expr", "count" ] + ComposedSequenceExpression: + description: >- + An expression of a sequence composed from multiple other + :ref:`Sequence Expressions` + objects. MUST have at least one component that is not a + ref:`LiteralSequenceExpression`. CANNOT be composed from + nested composed sequence expressions. + additionalProperties: false + type: object + properties: + type: + type: string + const: "ComposedSequenceExpression" + default: "ComposedSequenceExpression" + description: MUST be "ComposedSequenceExpression" + components: + type: array + uniqueItems: true + minItems: 2 + ordered: true + items: + oneOf: + - $ref: "#/definitions/LiteralSequenceExpression" + - $ref: "#/definitions/RepeatedSequenceExpression" + - $ref: "#/definitions/DerivedSequenceExpression" + contains: + oneOf: + - $ref: "#/definitions/RepeatedSequenceExpression" + - $ref: "#/definitions/DerivedSequenceExpression" + description: >- + An ordered list of :ref:`SequenceExpression` components + comprising the expression. + required: [ "components" ] + + # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + # Nested Classes + + GenotypeMember: + description: >- + A class for expressing the count of a specific :ref:`MolecularVariation` present + *in-trans* at a genomic locus represented by a :ref:`Genotype`. + type: object + properties: + type: + type: string + const: "GenotypeMember" + default: "GenotypeMember" + description: MUST be "GenotypeMember". + count: + oneOf: + - $ref: "#/definitions/Number" + - $ref: "#/definitions/IndefiniteRange" + - $ref: "#/definitions/DefiniteRange" + description: >- + The number of copies of the `variation` at a :ref:`Genotype` locus. + variation: + oneOf: + - $ref: "#/definitions/Allele" + - $ref: "#/definitions/Haplotype" + description: >- + A :ref:`MolecularVariation` at a :ref:`Genotype` locus. + required: [ "type", "count", "variation" ] + # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Feature @@ -600,12 +602,12 @@ definitions: heritable_required: [ "type" ] Gene: + inherits: Feature description: >- A reference to a Gene as defined by an authority. For human genes, the use of `hgnc `_ as the gene authority is RECOMMENDED. type: object - additionalProperties: false properties: type: type: string @@ -625,7 +627,6 @@ definitions: description: >- A simple integer value as a VRS class. type: object - additionalProperties: false properties: type: type: string @@ -641,7 +642,6 @@ definitions: description: >- A bounded, inclusive range of numbers. type: object - additionalProperties: false properties: type: type: string @@ -663,7 +663,6 @@ definitions: '>=' are all numbers greater than and including `value`, '<=' are all numbers less than and including `value`. type: object - additionalProperties: false properties: type: type: string @@ -688,7 +687,6 @@ definitions: # ============================================================================= CURIE: - additionalProperties: false description: >- A `W3C Compact URI `_ formatted string. A CURIE string has the structure ``prefix``:``reference``, as defined by @@ -698,7 +696,6 @@ definitions: example: "ensembl:ENSG00000139618" HumanCytoband: - additionalProperties: false description: >- A character string representing cytobands derived from the *International System for Human Cytogenomic Nomenclature* (ISCN) @@ -708,7 +705,6 @@ definitions: example: "q22.3" Residue: - additionalProperties: false description: >- A character representing a specific residue (i.e., molecular species) or groupings of these ("ambiguity codes"), using `one-letter IUPAC @@ -718,7 +714,6 @@ definitions: pattern: '[A-Z*\-]' Sequence: - additionalProperties: false description: >- A character string of :ref:`Residues ` that represents a biological sequence using the conventional sequence order (5’-to-3’ for @@ -739,7 +734,6 @@ definitions: to use for representing "ref-alt" style variation, including SNVs, MNVs, del, ins, and delins. This class is deprecated. Use :ref:`LiteralSequenceExpression` instead. - additionalProperties: false type: object properties: type: @@ -762,7 +756,6 @@ definitions: always represented by contiguous spans using interbase coordinates. This class is deprecated. Use SequenceInterval instead. - additionalProperties: false type: object properties: type: diff --git a/schema/vrs.json b/schema/vrs.json index 6dae8429..e9c99691 100644 --- a/schema/vrs.json +++ b/schema/vrs.json @@ -7,13 +7,25 @@ "description": "A representation of the state of one or more biomolecules.", "oneOf": [ { - "$ref": "#/definitions/MolecularVariation" + "$ref": "#/definitions/Allele" + }, + { + "$ref": "#/definitions/CopyNumberChange" + }, + { + "$ref": "#/definitions/CopyNumberCount" + }, + { + "$ref": "#/definitions/Genotype" + }, + { + "$ref": "#/definitions/Haplotype" }, { - "$ref": "#/definitions/SystemicVariation" + "$ref": "#/definitions/Text" }, { - "$ref": "#/definitions/UtilityVariation" + "$ref": "#/definitions/VariationSet" } ], "discriminator": { @@ -52,7 +64,13 @@ "description": "A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes.", "oneOf": [ { - "$ref": "#/definitions/CopyNumber" + "$ref": "#/definitions/CopyNumberChange" + }, + { + "$ref": "#/definitions/CopyNumberCount" + }, + { + "$ref": "#/definitions/Genotype" } ], "discriminator": { @@ -61,7 +79,6 @@ }, "Allele": { "description": "The state of a molecule at a Location.", - "additionalProperties": false, "type": "object", "properties": { "_id": { @@ -80,7 +97,10 @@ "$ref": "#/definitions/CURIE" }, { - "$ref": "#/definitions/Location" + "$ref": "#/definitions/ChromosomeLocation" + }, + { + "$ref": "#/definitions/SequenceLocation" } ], "description": "Where Allele is located" @@ -88,7 +108,16 @@ "state": { "oneOf": [ { - "$ref": "#/definitions/SequenceExpression" + "$ref": "#/definitions/ComposedSequenceExpression" + }, + { + "$ref": "#/definitions/DerivedSequenceExpression" + }, + { + "$ref": "#/definitions/LiteralSequenceExpression" + }, + { + "$ref": "#/definitions/RepeatedSequenceExpression" }, { "$ref": "#/definitions/SequenceState" @@ -106,11 +135,11 @@ "location", "state", "type" - ] + ], + "additionalProperties": false }, "Haplotype": { "description": "A set of non-overlapping Allele members that co-occur on the same molecule.", - "additionalProperties": false, "type": "object", "properties": { "_id": { @@ -125,8 +154,9 @@ }, "members": { "type": "array", - "minItems": 1, + "minItems": 2, "uniqueItems": true, + "ordered": false, "items": { "oneOf": [ { @@ -143,11 +173,11 @@ "required": [ "members", "type" - ] + ], + "additionalProperties": false }, "Text": { "description": "A free-text definition of variation.", - "additionalProperties": false, "type": "object", "properties": { "_id": { @@ -168,12 +198,12 @@ "required": [ "definition", "type" - ] + ], + "additionalProperties": false }, "VariationSet": { "description": "An unconstrained set of Variation members.", "type": "object", - "additionalProperties": false, "properties": { "_id": { "$ref": "#/definitions/CURIE", @@ -188,13 +218,32 @@ "members": { "type": "array", "uniqueItems": true, + "ordered": false, "items": { "oneOf": [ + { + "$ref": "#/definitions/Allele" + }, { "$ref": "#/definitions/CURIE" }, { - "$ref": "#/definitions/Variation" + "$ref": "#/definitions/CopyNumberChange" + }, + { + "$ref": "#/definitions/CopyNumberCount" + }, + { + "$ref": "#/definitions/Genotype" + }, + { + "$ref": "#/definitions/Haplotype" + }, + { + "$ref": "#/definitions/Text" + }, + { + "$ref": "#/definitions/VariationSet" } ] }, @@ -204,12 +253,12 @@ "required": [ "members", "type" - ] + ], + "additionalProperties": false }, - "CopyNumber": { - "additionalProperties": false, + "CopyNumberCount": { "type": "object", - "description": "The absolute count of discrete copies of a MolecularVariation, Feature, SequenceExpression, or a CURIE reference within a system (e.g. genome, cell, etc.).", + "description": "The absolute count of discrete copies of a Location or Feature, within a system (e.g. genome, cell, etc.).", "properties": { "_id": { "$ref": "#/definitions/CURIE", @@ -217,112 +266,148 @@ }, "type": { "type": "string", - "const": "CopyNumber", - "default": "CopyNumber", - "description": "MUST be \"CopyNumber\"" + "const": "CopyNumberCount", + "default": "CopyNumberCount", + "description": "MUST be \"CopyNumberCount\"" }, "subject": { "oneOf": [ { - "$ref": "#/definitions/MolecularVariation" + "$ref": "#/definitions/CURIE" }, { - "$ref": "#/definitions/Feature" + "$ref": "#/definitions/ChromosomeLocation" }, { - "$ref": "#/definitions/SequenceExpression" + "$ref": "#/definitions/Gene" }, { - "$ref": "#/definitions/CURIE" + "$ref": "#/definitions/SequenceLocation" } ], - "description": "Subject of the Copy Number object" + "description": "A location for which the number of systemic copies is described." }, "copies": { "oneOf": [ { - "$ref": "#/definitions/Number" + "$ref": "#/definitions/DefiniteRange" }, { "$ref": "#/definitions/IndefiniteRange" }, { - "$ref": "#/definitions/DefiniteRange" + "$ref": "#/definitions/Number" } ], "description": "The integral number of copies of the subject in a system" } }, - "allOf": [ - { - "if": { - "properties": { - "copies": { - "$ref": "#/definitions/Number" - } - } - }, - "then": { - "properties": { - "copies": { - "properties": { - "value": { - "minimum": 0 - } - } - } - } - } + "required": [ + "copies", + "subject", + "type" + ], + "additionalProperties": false + }, + "CopyNumberChange": { + "type": "object", + "maturity": "draft", + "description": "An assessment of the copy number of a Location or a Feature within a system (e.g. genome, cell, etc.) relative to a baseline ploidy.", + "properties": { + "_id": { + "$ref": "#/definitions/CURIE", + "description": "Variation Id. MUST be unique within document." }, - { - "if": { - "properties": { - "copies": { - "$ref": "#/definitions/IndefiniteRange" - } - } - }, - "then": { - "properties": { - "copies": { - "properties": { - "value": { - "minimum": 0 - } - } - } - } - } + "type": { + "type": "string", + "const": "CopyNumberChange", + "default": "CopyNumberChange", + "description": "MUST be \"CopyNumberChange\"" }, - { - "if": { - "properties": { - "copies": { - "$ref": "#/definitions/DefiniteRange" - } + "subject": { + "oneOf": [ + { + "$ref": "#/definitions/CURIE" + }, + { + "$ref": "#/definitions/ChromosomeLocation" + }, + { + "$ref": "#/definitions/Gene" + }, + { + "$ref": "#/definitions/SequenceLocation" } + ], + "description": "A location for which the number of systemic copies is described." + }, + "copy_change": { + "type": "string", + "enum": [ + "efo:0030069", + "efo:0020073", + "efo:0030068", + "efo:0030067", + "efo:0030064", + "efo:0030070", + "efo:0030071", + "efo:0030072" + ], + "description": "MUST be one of \"efo:0030069\" (complete genomic loss), \"efo:0020073\" (high-level loss), \"efo:0030068\" (low-level loss), \"efo:0030067\" (loss), \"efo:0030064\" (regional base ploidy), \"efo:0030070\" (gain), \"efo:0030071\" (low-level gain), \"efo:0030072\" (high-level gain)." + } + }, + "required": [ + "copy_change", + "subject", + "type" + ], + "additionalProperties": false + }, + "Genotype": { + "description": "A quantified set of MolecularVariation associated with a genomic locus.", + "type": "object", + "properties": { + "_id": { + "$ref": "#/definitions/CURIE", + "description": "Variation Id. MUST be unique within document." + }, + "type": { + "type": "string", + "const": "Genotype", + "default": "Genotype", + "description": "MUST be \"Genotype\"" + }, + "members": { + "type": "array", + "uniqueItems": true, + "minItems": 1, + "ordered": false, + "items": { + "$ref": "#/definitions/GenotypeMember" }, - "then": { - "properties": { - "copies": { - "properties": { - "min": { - "minimum": 0 - }, - "max": { - "minimum": 0 - } - } - } + "description": "Each GenotypeMember in `members` describes a MolecularVariation and the count of that variation at the locus." + }, + "count": { + "oneOf": [ + { + "$ref": "#/definitions/DefiniteRange" + }, + { + "$ref": "#/definitions/IndefiniteRange" + }, + { + "$ref": "#/definitions/Number" } - } + ], + "description": "The total number of copies of all MolecularVariation at this locus, MUST be greater than or equal to the sum of GenotypeMember copy counts and MUST be greater than or equal to 1. If greater than the total of GenotypeMember counts, this field describes additional MolecularVariation that exist but are not explicitly described." } - ], + }, "required": [ - "copies", - "subject", + "count", + "members", "type" - ] + ], + "additionalProperties": false }, "Location": { "description": "A contiguous segment of a biological sequence.", @@ -339,7 +424,6 @@ } }, "ChromosomeLocation": { - "additionalProperties": false, "description": "A Location on a chromosome defined by a species and chromosome name.", "type": "object", "properties": { @@ -372,10 +456,10 @@ "interval", "species_id", "type" - ] + ], + "additionalProperties": false }, "SequenceLocation": { - "additionalProperties": false, "description": "A Location defined by an interval on a referenced Sequence.", "type": "object", "properties": { @@ -409,12 +493,12 @@ "interval", "sequence_id", "type" - ] + ], + "additionalProperties": false }, "SequenceInterval": { "description": "A SequenceInterval represents a span on a Sequence. Positions are always represented by contiguous spans using interbase coordinates or coordinate ranges.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -425,13 +509,13 @@ "start": { "oneOf": [ { - "$ref": "#/definitions/Number" + "$ref": "#/definitions/DefiniteRange" }, { "$ref": "#/definitions/IndefiniteRange" }, { - "$ref": "#/definitions/DefiniteRange" + "$ref": "#/definitions/Number" } ], "description": "The start coordinate or range of the interval. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range less than the value of `end`." @@ -439,156 +523,28 @@ "end": { "oneOf": [ { - "$ref": "#/definitions/Number" + "$ref": "#/definitions/DefiniteRange" }, { "$ref": "#/definitions/IndefiniteRange" }, { - "$ref": "#/definitions/DefiniteRange" + "$ref": "#/definitions/Number" } ], "description": "The end coordinate or range of the interval. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range greater than the value of `start`." } }, - "allOf": [ - { - "if": { - "properties": { - "start": { - "$ref": "#/definitions/Number" - } - } - }, - "then": { - "properties": { - "start": { - "properties": { - "value": { - "minimum": 0 - } - } - } - } - } - }, - { - "if": { - "properties": { - "start": { - "$ref": "#/definitions/IndefiniteRange" - } - } - }, - "then": { - "properties": { - "start": { - "properties": { - "value": { - "minimum": 0 - } - } - } - } - } - }, - { - "if": { - "properties": { - "start": { - "$ref": "#/definitions/DefiniteRange" - } - } - }, - "then": { - "properties": { - "start": { - "properties": { - "min": { - "minimum": 0 - }, - "max": { - "minimum": 0 - } - } - } - } - } - }, - { - "if": { - "properties": { - "end": { - "$ref": "#/definitions/Number" - } - } - }, - "then": { - "properties": { - "end": { - "properties": { - "value": { - "minimum": 0 - } - } - } - } - } - }, - { - "if": { - "properties": { - "end": { - "$ref": "#/definitions/IndefiniteRange" - } - } - }, - "then": { - "properties": { - "end": { - "properties": { - "value": { - "minimum": 0 - } - } - } - } - } - }, - { - "if": { - "properties": { - "end": { - "$ref": "#/definitions/DefiniteRange" - } - } - }, - "then": { - "properties": { - "end": { - "properties": { - "min": { - "minimum": 0 - }, - "max": { - "minimum": 0 - } - } - } - } - } - } - ], "required": [ "end", "start", "type" - ] + ], + "additionalProperties": false }, "CytobandInterval": { "description": "A contiguous span on a chromosome defined by cytoband features. The span includes the constituent regions described by the start and end cytobands, as well as any intervening regions.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -602,7 +558,7 @@ }, "end": { "$ref": "#/definitions/HumanCytoband", - "description": "The start cytoband region. MUST specify a region nearer the terminal end (telomere) of the chromosome q-arm than `start`." + "description": "The end cytoband region. MUST specify a region nearer the terminal end (telomere) of the chromosome q-arm than `start`." } }, "example": { @@ -614,17 +570,21 @@ "end", "start", "type" - ] + ], + "additionalProperties": false }, "SequenceExpression": { "description": "An expression describing a Sequence.", "oneOf": [ { - "$ref": "#/definitions/LiteralSequenceExpression" + "$ref": "#/definitions/ComposedSequenceExpression" }, { "$ref": "#/definitions/DerivedSequenceExpression" }, + { + "$ref": "#/definitions/LiteralSequenceExpression" + }, { "$ref": "#/definitions/RepeatedSequenceExpression" } @@ -636,7 +596,6 @@ "LiteralSequenceExpression": { "description": "An explicit expression of a Sequence.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -652,12 +611,12 @@ "required": [ "sequence", "type" - ] + ], + "additionalProperties": false }, "DerivedSequenceExpression": { "description": "An approximate expression of a sequence that is derived from a referenced sequence location. Use of this class indicates that the derived sequence is *approximately equivalent* to the reference indicated, and is typically used for describing large regions in contexts where the use of an approximate sequence is inconsequential.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -678,11 +637,11 @@ "location", "reverse_complement", "type" - ] + ], + "additionalProperties": false }, "RepeatedSequenceExpression": { "description": "An expression of a sequence comprised of a tandem repeating subsequence.", - "additionalProperties": false, "type": "object", "properties": { "type": { @@ -694,10 +653,10 @@ "seq_expr": { "oneOf": [ { - "$ref": "#/definitions/LiteralSequenceExpression" + "$ref": "#/definitions/DerivedSequenceExpression" }, { - "$ref": "#/definitions/DerivedSequenceExpression" + "$ref": "#/definitions/LiteralSequenceExpression" } ], "description": "An expression of the repeating subsequence" @@ -705,88 +664,113 @@ "count": { "oneOf": [ { - "$ref": "#/definitions/Number" + "$ref": "#/definitions/DefiniteRange" }, { "$ref": "#/definitions/IndefiniteRange" }, { - "$ref": "#/definitions/DefiniteRange" + "$ref": "#/definitions/Number" } ], "description": "The count of repeated units, as an integer or inclusive range" } }, - "allOf": [ - { - "if": { - "properties": { - "count": { - "$ref": "#/definitions/Number" - } - } - }, - "then": { - "properties": { - "count": { - "properties": { - "value": { - "minimum": 0 - } - } - } - } - } + "required": [ + "count", + "seq_expr", + "type" + ], + "additionalProperties": false + }, + "ComposedSequenceExpression": { + "description": "An expression of a sequence composed from multiple other Sequence Expressions objects. MUST have at least one component that is not a ref:`LiteralSequenceExpression`. CANNOT be composed from nested composed sequence expressions.", + "additionalProperties": false, + "type": "object", + "properties": { + "type": { + "type": "string", + "const": "ComposedSequenceExpression", + "default": "ComposedSequenceExpression", + "description": "MUST be \"ComposedSequenceExpression\"" }, - { - "if": { - "properties": { - "count": { - "$ref": "#/definitions/IndefiniteRange" + "components": { + "type": "array", + "uniqueItems": true, + "minItems": 2, + "ordered": true, + "items": { + "oneOf": [ + { + "$ref": "#/definitions/DerivedSequenceExpression" + }, + { + "$ref": "#/definitions/LiteralSequenceExpression" + }, + { + "$ref": "#/definitions/RepeatedSequenceExpression" } - } + ] }, - "then": { - "properties": { - "count": { - "properties": { - "value": { - "minimum": 0 - } - } + "contains": { + "oneOf": [ + { + "$ref": "#/definitions/RepeatedSequenceExpression" + }, + { + "$ref": "#/definitions/DerivedSequenceExpression" } - } - } + ] + }, + "description": "An ordered list of SequenceExpression components comprising the expression." + } + }, + "required": [ + "components" + ] + }, + "GenotypeMember": { + "description": "A class for expressing the count of a specific MolecularVariation present *in-trans* at a genomic locus represented by a Genotype.", + "type": "object", + "properties": { + "type": { + "type": "string", + "const": "GenotypeMember", + "default": "GenotypeMember", + "description": "MUST be \"GenotypeMember\"." }, - { - "if": { - "properties": { - "count": { - "$ref": "#/definitions/DefiniteRange" - } + "count": { + "oneOf": [ + { + "$ref": "#/definitions/DefiniteRange" + }, + { + "$ref": "#/definitions/IndefiniteRange" + }, + { + "$ref": "#/definitions/Number" } - }, - "then": { - "properties": { - "count": { - "properties": { - "min": { - "minimum": 0 - }, - "max": { - "minimum": 0 - } - } - } + ], + "description": "The number of copies of the `variation` at a Genotype locus." + }, + "variation": { + "oneOf": [ + { + "$ref": "#/definitions/Allele" + }, + { + "$ref": "#/definitions/Haplotype" } - } + ], + "description": "A MolecularVariation at a Genotype locus." } - ], + }, "required": [ "count", - "seq_expr", - "type" - ] + "type", + "variation" + ], + "additionalProperties": false }, "Feature": { "description": "A named entity that can be mapped to a Location. Genes, protein domains, exons, and chromosomes are some examples of common biological entities that may be Features.", @@ -802,7 +786,6 @@ "Gene": { "description": "A reference to a Gene as defined by an authority. For human genes, the use of [hgnc](https://registry.identifiers.org/registry/hgnc) as the gene authority is RECOMMENDED.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -818,12 +801,12 @@ "required": [ "gene_id", "type" - ] + ], + "additionalProperties": false }, "Number": { "description": "A simple integer value as a VRS class.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -839,12 +822,12 @@ "required": [ "type", "value" - ] + ], + "additionalProperties": false }, "DefiniteRange": { "description": "A bounded, inclusive range of numbers.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -865,12 +848,12 @@ "max", "min", "type" - ] + ], + "additionalProperties": false }, "IndefiniteRange": { "description": "A half-bounded range of numbers represented as a number bound and associated comparator. The bound operator is interpreted as follows: '>=' are all numbers greater than and including `value`, '<=' are all numbers less than and including `value`.", "type": "object", - "additionalProperties": false, "properties": { "type": { "type": "string", @@ -895,30 +878,27 @@ "comparator", "type", "value" - ] + ], + "additionalProperties": false }, "CURIE": { - "additionalProperties": false, "description": "A [W3C Compact URI](https://www.w3.org/TR/curie/) formatted string. A CURIE string has the structure ``prefix``:``reference``, as defined by the W3C syntax.", "type": "string", "pattern": "^\\w[^:]*:.+$", "example": "ensembl:ENSG00000139618" }, "HumanCytoband": { - "additionalProperties": false, "description": "A character string representing cytobands derived from the *International System for Human Cytogenomic Nomenclature* (ISCN) [guidelines](http://doi.org/10.1159/isbn.978-3-318-06861-0).", "type": "string", "pattern": "^cen|[pq](ter|([1-9][0-9]*(\\.[1-9][0-9]*)?))$", "example": "q22.3" }, "Residue": { - "additionalProperties": false, "description": "A character representing a specific residue (i.e., molecular species) or groupings of these (\"ambiguity codes\"), using [one-letter IUPAC abbreviations](https://en.wikipedia.org/wiki/International_Union_of_Pure_and_Applied_Chemistry#Amino_acid_and_nucleotide_base_codes) for nucleic acids and amino acids.", "type": "string", "pattern": "[A-Z*\\-]" }, "Sequence": { - "additionalProperties": false, "description": "A character string of Residues that represents a biological sequence using the conventional sequence order (5\u2019-to-3\u2019 for nucleic acid sequences, and amino-to-carboxyl for amino acid sequences). IUPAC ambiguity codes are permitted in Sequences.", "type": "string", "pattern": "^[A-Z*\\-]*$" @@ -926,7 +906,6 @@ "SequenceState": { "deprecated": true, "description": "DEPRECATED. A Sequence as a State. This is the State class to use for representing \"ref-alt\" style variation, including SNVs, MNVs, del, ins, and delins. This class is deprecated. Use LiteralSequenceExpression instead.", - "additionalProperties": false, "type": "object", "properties": { "type": { @@ -947,12 +926,12 @@ "required": [ "sequence", "type" - ] + ], + "additionalProperties": false }, "SimpleInterval": { "deprecated": true, "description": "DEPRECATED: A SimpleInterval represents a span of sequence. Positions are always represented by contiguous spans using interbase coordinates. This class is deprecated. Use SequenceInterval instead.", - "additionalProperties": false, "type": "object", "properties": { "type": { @@ -979,7 +958,8 @@ "end", "start", "type" - ] + ], + "additionalProperties": false } } } \ No newline at end of file diff --git a/schema/vrs.yaml b/schema/vrs.yaml index 583a62db..e84f369e 100644 --- a/schema/vrs.yaml +++ b/schema/vrs.yaml @@ -5,9 +5,13 @@ definitions: Variation: description: A representation of the state of one or more biomolecules. oneOf: - - $ref: '#/definitions/MolecularVariation' - - $ref: '#/definitions/SystemicVariation' - - $ref: '#/definitions/UtilityVariation' + - $ref: '#/definitions/Allele' + - $ref: '#/definitions/CopyNumberChange' + - $ref: '#/definitions/CopyNumberCount' + - $ref: '#/definitions/Genotype' + - $ref: '#/definitions/Haplotype' + - $ref: '#/definitions/Text' + - $ref: '#/definitions/VariationSet' discriminator: propertyName: type MolecularVariation: @@ -30,15 +34,16 @@ definitions: description: A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes. oneOf: - - $ref: '#/definitions/CopyNumber' + - $ref: '#/definitions/CopyNumberChange' + - $ref: '#/definitions/CopyNumberCount' + - $ref: '#/definitions/Genotype' discriminator: propertyName: type Allele: description: The state of a molecule at a Location. - additionalProperties: false type: object properties: - _id: &id001 + _id: $ref: '#/definitions/CURIE' description: Variation Id. MUST be unique within document. type: @@ -49,11 +54,15 @@ definitions: location: oneOf: - $ref: '#/definitions/CURIE' - - $ref: '#/definitions/Location' + - $ref: '#/definitions/ChromosomeLocation' + - $ref: '#/definitions/SequenceLocation' description: Where Allele is located state: oneOf: - - $ref: '#/definitions/SequenceExpression' + - $ref: '#/definitions/ComposedSequenceExpression' + - $ref: '#/definitions/DerivedSequenceExpression' + - $ref: '#/definitions/LiteralSequenceExpression' + - $ref: '#/definitions/RepeatedSequenceExpression' - $ref: '#/definitions/SequenceState' description: An expression of the sequence state deprecated: @@ -62,13 +71,15 @@ definitions: - location - state - type + additionalProperties: false Haplotype: description: A set of non-overlapping Allele members that co-occur on the same molecule. - additionalProperties: false type: object properties: - _id: *id001 + _id: + $ref: '#/definitions/CURIE' + description: Variation Id. MUST be unique within document. type: type: string const: Haplotype @@ -76,8 +87,9 @@ definitions: description: MUST be "Haplotype" members: type: array - minItems: 1 + minItems: 2 uniqueItems: true + ordered: false items: oneOf: - $ref: '#/definitions/Allele' @@ -87,12 +99,14 @@ definitions: required: - members - type + additionalProperties: false Text: description: A free-text definition of variation. - additionalProperties: false type: object properties: - _id: *id001 + _id: + $ref: '#/definitions/CURIE' + description: Variation Id. MUST be unique within document. type: type: string const: Text @@ -105,12 +119,14 @@ definitions: required: - definition - type + additionalProperties: false VariationSet: description: An unconstrained set of Variation members. type: object - additionalProperties: false properties: - _id: *id001 + _id: + $ref: '#/definitions/CURIE' + description: Variation Id. MUST be unique within document. type: type: string const: VariationSet @@ -119,78 +135,132 @@ definitions: members: type: array uniqueItems: true + ordered: false items: oneOf: + - $ref: '#/definitions/Allele' - $ref: '#/definitions/CURIE' - - $ref: '#/definitions/Variation' + - $ref: '#/definitions/CopyNumberChange' + - $ref: '#/definitions/CopyNumberCount' + - $ref: '#/definitions/Genotype' + - $ref: '#/definitions/Haplotype' + - $ref: '#/definitions/Text' + - $ref: '#/definitions/VariationSet' description: List of Variation objects or identifiers. Attribute is required, but MAY be empty. required: - members - type - CopyNumber: additionalProperties: false + CopyNumberCount: type: object - description: The absolute count of discrete copies of a MolecularVariation, Feature, - SequenceExpression, or a CURIE reference within a system (e.g. genome, cell, - etc.). + description: The absolute count of discrete copies of a Location or Feature, within + a system (e.g. genome, cell, etc.). properties: - _id: *id001 + _id: + $ref: '#/definitions/CURIE' + description: Variation Id. MUST be unique within document. type: type: string - const: CopyNumber - default: CopyNumber - description: MUST be "CopyNumber" + const: CopyNumberCount + default: CopyNumberCount + description: MUST be "CopyNumberCount" subject: oneOf: - - $ref: '#/definitions/MolecularVariation' - - $ref: '#/definitions/Feature' - - $ref: '#/definitions/SequenceExpression' - $ref: '#/definitions/CURIE' - description: Subject of the Copy Number object + - $ref: '#/definitions/ChromosomeLocation' + - $ref: '#/definitions/Gene' + - $ref: '#/definitions/SequenceLocation' + description: A location for which the number of systemic copies is described. copies: oneOf: - - $ref: '#/definitions/Number' - - $ref: '#/definitions/IndefiniteRange' - $ref: '#/definitions/DefiniteRange' + - $ref: '#/definitions/IndefiniteRange' + - $ref: '#/definitions/Number' description: The integral number of copies of the subject in a system - allOf: - - if: - properties: - copies: - $ref: '#/definitions/Number' - then: - properties: - copies: - properties: - value: - minimum: 0 - - if: - properties: - copies: - $ref: '#/definitions/IndefiniteRange' - then: - properties: - copies: - properties: - value: - minimum: 0 - - if: - properties: - copies: - $ref: '#/definitions/DefiniteRange' - then: - properties: - copies: - properties: - min: - minimum: 0 - max: - minimum: 0 required: - copies - subject - type + additionalProperties: false + CopyNumberChange: + type: object + maturity: draft + description: An assessment of the copy number of a Location or a Feature within + a system (e.g. genome, cell, etc.) relative to a baseline ploidy. + properties: + _id: + $ref: '#/definitions/CURIE' + description: Variation Id. MUST be unique within document. + type: + type: string + const: CopyNumberChange + default: CopyNumberChange + description: MUST be "CopyNumberChange" + subject: + oneOf: + - $ref: '#/definitions/CURIE' + - $ref: '#/definitions/ChromosomeLocation' + - $ref: '#/definitions/Gene' + - $ref: '#/definitions/SequenceLocation' + description: A location for which the number of systemic copies is described. + copy_change: + type: string + enum: + - efo:0030069 + - efo:0020073 + - efo:0030068 + - efo:0030067 + - efo:0030064 + - efo:0030070 + - efo:0030071 + - efo:0030072 + description: MUST be one of "efo:0030069" (complete genomic loss), "efo:0020073" + (high-level loss), "efo:0030068" (low-level loss), "efo:0030067" (loss), + "efo:0030064" (regional base ploidy), "efo:0030070" (gain), "efo:0030071" + (low-level gain), "efo:0030072" (high-level gain). + required: + - copy_change + - subject + - type + additionalProperties: false + Genotype: + description: A quantified set of MolecularVariation associated with a genomic + locus. + type: object + properties: + _id: + $ref: '#/definitions/CURIE' + description: Variation Id. MUST be unique within document. + type: + type: string + const: Genotype + default: Genotype + description: MUST be "Genotype" + members: + type: array + uniqueItems: true + minItems: 1 + ordered: false + items: + $ref: '#/definitions/GenotypeMember' + description: Each GenotypeMember in `members` describes a MolecularVariation + and the count of that variation at the locus. + count: + oneOf: + - $ref: '#/definitions/DefiniteRange' + - $ref: '#/definitions/IndefiniteRange' + - $ref: '#/definitions/Number' + description: The total number of copies of all MolecularVariation at this + locus, MUST be greater than or equal to the sum of GenotypeMember copy counts + and MUST be greater than or equal to 1. If greater than the total of GenotypeMember + counts, this field describes additional MolecularVariation that exist but + are not explicitly described. + required: + - count + - members + - type + additionalProperties: false Location: description: A contiguous segment of a biological sequence. oneOf: @@ -199,11 +269,10 @@ definitions: discriminator: propertyName: type ChromosomeLocation: - additionalProperties: false description: A Location on a chromosome defined by a species and chromosome name. type: object properties: - _id: &id002 + _id: $ref: '#/definitions/CURIE' description: Location Id. MUST be unique within document. type: @@ -228,12 +297,14 @@ definitions: - interval - species_id - type - SequenceLocation: additionalProperties: false + SequenceLocation: description: A Location defined by an interval on a referenced Sequence. type: object properties: - _id: *id002 + _id: + $ref: '#/definitions/CURIE' + description: Location Id. MUST be unique within document. type: type: string const: SequenceLocation @@ -251,12 +322,12 @@ definitions: - interval - sequence_id - type + additionalProperties: false SequenceInterval: description: A SequenceInterval represents a span on a Sequence. Positions are always represented by contiguous spans using interbase coordinates or coordinate ranges. type: object - additionalProperties: false properties: type: type: string @@ -265,95 +336,30 @@ definitions: description: MUST be "SequenceInterval" start: oneOf: - - $ref: '#/definitions/Number' - - $ref: '#/definitions/IndefiniteRange' - $ref: '#/definitions/DefiniteRange' + - $ref: '#/definitions/IndefiniteRange' + - $ref: '#/definitions/Number' description: The start coordinate or range of the interval. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range less than the value of `end`. end: oneOf: - - $ref: '#/definitions/Number' - - $ref: '#/definitions/IndefiniteRange' - $ref: '#/definitions/DefiniteRange' + - $ref: '#/definitions/IndefiniteRange' + - $ref: '#/definitions/Number' description: The end coordinate or range of the interval. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range greater than the value of `start`. - allOf: - - if: - properties: - start: - $ref: '#/definitions/Number' - then: - properties: - start: - properties: - value: - minimum: 0 - - if: - properties: - start: - $ref: '#/definitions/IndefiniteRange' - then: - properties: - start: - properties: - value: - minimum: 0 - - if: - properties: - start: - $ref: '#/definitions/DefiniteRange' - then: - properties: - start: - properties: - min: - minimum: 0 - max: - minimum: 0 - - if: - properties: - end: - $ref: '#/definitions/Number' - then: - properties: - end: - properties: - value: - minimum: 0 - - if: - properties: - end: - $ref: '#/definitions/IndefiniteRange' - then: - properties: - end: - properties: - value: - minimum: 0 - - if: - properties: - end: - $ref: '#/definitions/DefiniteRange' - then: - properties: - end: - properties: - min: - minimum: 0 - max: - minimum: 0 required: - end - start - type + additionalProperties: false CytobandInterval: description: A contiguous span on a chromosome defined by cytoband features. The span includes the constituent regions described by the start and end cytobands, as well as any intervening regions. type: object - additionalProperties: false properties: type: type: string @@ -366,7 +372,7 @@ definitions: end (telomere) of the chromosome p-arm than `end`. end: $ref: '#/definitions/HumanCytoband' - description: The start cytoband region. MUST specify a region nearer the terminal + description: The end cytoband region. MUST specify a region nearer the terminal end (telomere) of the chromosome q-arm than `start`. example: type: CytobandInterval @@ -376,18 +382,19 @@ definitions: - end - start - type + additionalProperties: false SequenceExpression: description: An expression describing a Sequence. oneOf: - - $ref: '#/definitions/LiteralSequenceExpression' + - $ref: '#/definitions/ComposedSequenceExpression' - $ref: '#/definitions/DerivedSequenceExpression' + - $ref: '#/definitions/LiteralSequenceExpression' - $ref: '#/definitions/RepeatedSequenceExpression' discriminator: propertyName: type LiteralSequenceExpression: description: An explicit expression of a Sequence. type: object - additionalProperties: false properties: type: type: string @@ -400,6 +407,7 @@ definitions: required: - sequence - type + additionalProperties: false DerivedSequenceExpression: description: An approximate expression of a sequence that is derived from a referenced sequence location. Use of this class indicates that the derived sequence is @@ -407,7 +415,6 @@ definitions: for describing large regions in contexts where the use of an approximate sequence is inconsequential. type: object - additionalProperties: false properties: type: type: string @@ -425,9 +432,9 @@ definitions: - location - reverse_complement - type + additionalProperties: false RepeatedSequenceExpression: description: An expression of a sequence comprised of a tandem repeating subsequence. - additionalProperties: false type: object properties: type: @@ -437,52 +444,76 @@ definitions: description: MUST be "RepeatedSequenceExpression" seq_expr: oneOf: - - $ref: '#/definitions/LiteralSequenceExpression' - $ref: '#/definitions/DerivedSequenceExpression' + - $ref: '#/definitions/LiteralSequenceExpression' description: An expression of the repeating subsequence count: oneOf: - - $ref: '#/definitions/Number' - - $ref: '#/definitions/IndefiniteRange' - $ref: '#/definitions/DefiniteRange' + - $ref: '#/definitions/IndefiniteRange' + - $ref: '#/definitions/Number' description: The count of repeated units, as an integer or inclusive range - allOf: - - if: - properties: - count: - $ref: '#/definitions/Number' - then: - properties: - count: - properties: - value: - minimum: 0 - - if: - properties: - count: - $ref: '#/definitions/IndefiniteRange' - then: - properties: - count: - properties: - value: - minimum: 0 - - if: - properties: - count: - $ref: '#/definitions/DefiniteRange' - then: - properties: - count: - properties: - min: - minimum: 0 - max: - minimum: 0 required: - count - seq_expr - type + additionalProperties: false + ComposedSequenceExpression: + description: An expression of a sequence composed from multiple other Sequence + Expressions objects. MUST have at least one component that is not a ref:`LiteralSequenceExpression`. + CANNOT be composed from nested composed sequence expressions. + additionalProperties: false + type: object + properties: + type: + type: string + const: ComposedSequenceExpression + default: ComposedSequenceExpression + description: MUST be "ComposedSequenceExpression" + components: + type: array + uniqueItems: true + minItems: 2 + ordered: true + items: + oneOf: + - $ref: '#/definitions/DerivedSequenceExpression' + - $ref: '#/definitions/LiteralSequenceExpression' + - $ref: '#/definitions/RepeatedSequenceExpression' + contains: + oneOf: + - $ref: '#/definitions/RepeatedSequenceExpression' + - $ref: '#/definitions/DerivedSequenceExpression' + description: An ordered list of SequenceExpression components comprising + the expression. + required: + - components + GenotypeMember: + description: A class for expressing the count of a specific MolecularVariation + present *in-trans* at a genomic locus represented by a Genotype. + type: object + properties: + type: + type: string + const: GenotypeMember + default: GenotypeMember + description: MUST be "GenotypeMember". + count: + oneOf: + - $ref: '#/definitions/DefiniteRange' + - $ref: '#/definitions/IndefiniteRange' + - $ref: '#/definitions/Number' + description: The number of copies of the `variation` at a Genotype locus. + variation: + oneOf: + - $ref: '#/definitions/Allele' + - $ref: '#/definitions/Haplotype' + description: A MolecularVariation at a Genotype locus. + required: + - count + - type + - variation + additionalProperties: false Feature: description: A named entity that can be mapped to a Location. Genes, protein domains, exons, and chromosomes are some examples of common biological entities that @@ -496,7 +527,6 @@ definitions: the use of [hgnc](https://registry.identifiers.org/registry/hgnc) as the gene authority is RECOMMENDED. type: object - additionalProperties: false properties: type: type: string @@ -509,10 +539,10 @@ definitions: required: - gene_id - type + additionalProperties: false Number: description: A simple integer value as a VRS class. type: object - additionalProperties: false properties: type: type: string @@ -525,10 +555,10 @@ definitions: required: - type - value + additionalProperties: false DefiniteRange: description: A bounded, inclusive range of numbers. type: object - additionalProperties: false properties: type: type: string @@ -545,13 +575,13 @@ definitions: - max - min - type + additionalProperties: false IndefiniteRange: description: 'A half-bounded range of numbers represented as a number bound and associated comparator. The bound operator is interpreted as follows: ''>='' are all numbers greater than and including `value`, ''<='' are all numbers less than and including `value`.' type: object - additionalProperties: false properties: type: type: string @@ -572,8 +602,8 @@ definitions: - comparator - type - value - CURIE: additionalProperties: false + CURIE: description: A [W3C Compact URI](https://www.w3.org/TR/curie/) formatted string. A CURIE string has the structure ``prefix``:``reference``, as defined by the W3C syntax. @@ -581,21 +611,18 @@ definitions: pattern: ^\w[^:]*:.+$ example: ensembl:ENSG00000139618 HumanCytoband: - additionalProperties: false description: A character string representing cytobands derived from the *International System for Human Cytogenomic Nomenclature* (ISCN) [guidelines](http://doi.org/10.1159/isbn.978-3-318-06861-0). type: string pattern: ^cen|[pq](ter|([1-9][0-9]*(\.[1-9][0-9]*)?))$ example: q22.3 Residue: - additionalProperties: false description: A character representing a specific residue (i.e., molecular species) or groupings of these ("ambiguity codes"), using [one-letter IUPAC abbreviations](https://en.wikipedia.org/wiki/International_Union_of_Pure_and_Applied_Chemistry#Amino_acid_and_nucleotide_base_codes) for nucleic acids and amino acids. type: string pattern: '[A-Z*\-]' Sequence: - additionalProperties: false description: "A character string of Residues that represents a biological sequence\ \ using the conventional sequence order (5\u2019-to-3\u2019 for nucleic acid\ \ sequences, and amino-to-carboxyl for amino acid sequences). IUPAC ambiguity\ @@ -607,7 +634,6 @@ definitions: description: DEPRECATED. A Sequence as a State. This is the State class to use for representing "ref-alt" style variation, including SNVs, MNVs, del, ins, and delins. This class is deprecated. Use LiteralSequenceExpression instead. - additionalProperties: false type: object properties: type: @@ -624,12 +650,12 @@ definitions: required: - sequence - type + additionalProperties: false SimpleInterval: deprecated: true description: 'DEPRECATED: A SimpleInterval represents a span of sequence. Positions are always represented by contiguous spans using interbase coordinates. This class is deprecated. Use SequenceInterval instead.' - additionalProperties: false type: object properties: type: @@ -651,3 +677,4 @@ definitions: - end - start - type + additionalProperties: false diff --git a/tests/config.py b/tests/config.py index 385f3123..18443c0b 100644 --- a/tests/config.py +++ b/tests/config.py @@ -4,4 +4,5 @@ schema_dir = root_dir / "schema" vrs_yaml_path = schema_dir / "vrs-source.yaml" vrs_json_path = schema_dir / "vrs.json" +vrs_merged_yaml_path = schema_dir / "merged.yaml" diff --git a/tests/test_basic.py b/tests/test_basic.py index fea4e985..75efe0f1 100644 --- a/tests/test_basic.py +++ b/tests/test_basic.py @@ -1,15 +1,15 @@ import json -import python_jsonschema_objects as pjs import yaml +import python_jsonschema_objects as pjs from schema.helpers import pjs_filter from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor +from jsonschema import validate, RefResolver -from config import vrs_json_path, vrs_yaml_path +from config import vrs_json_path, vrs_yaml_path, root_dir # Are the yaml and json parsable and do they match? -y = yaml.load(open(vrs_yaml_path), Loader=yaml.SafeLoader) -p = YamlSchemaProcessor(y) +p = YamlSchemaProcessor(vrs_yaml_path) j = json.load(open(vrs_json_path)) @@ -19,5 +19,18 @@ def test_json_yaml_match(): # Can pjs handle this schema? def test_pjs_smoke(): - ob = pjs.ObjectBuilder(pjs_filter(y)) + ob = pjs.ObjectBuilder(pjs_filter(j)) assert ob.build_classes() # no exception => okay + + +def test_schema_validation(): + """Test that examples in validation/models.yaml are valid""" + resolver = RefResolver.from_schema(j, store={"definitions": j}) + schema_definitions = j["definitions"] + validation_models = root_dir / "validation" / "models.yaml" + validation_tests = yaml.load(open(validation_models), Loader=yaml.SafeLoader) + for cls, tests in validation_tests.items(): + for t in tests: + validate(instance=t["in"], + schema=schema_definitions[cls], + resolver=resolver) diff --git a/validation/models.yaml b/validation/models.yaml index 54ec9253..98695c55 100644 --- a/validation/models.yaml +++ b/validation/models.yaml @@ -1,19 +1,19 @@ Number: - - + - in: type: Number value: 55 out: ga4gh_serialize: '{"type":"Number","value":55}' Gene: - - + - in: gene_id: ncbigene:384 type: Gene out: ga4gh_serialize: '{"gene_id":"ncbigene:384","type":"Gene"}' SimpleInterval: - - + - in: end: 44908822 start: 44908821 @@ -169,6 +169,38 @@ RepeatedSequenceExpression: type: RepeatedSequenceExpression out: ga4gh_serialize: '{"count":{"comparator":">=","type":"IndefiniteRange","value":6},"seq_expr":{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","reverse_complement":false,"type":"DerivedSequenceExpression"},"type":"RepeatedSequenceExpression"}' +ComposedSequenceExpression: + - name: "Composed Sequence Expression w/ order 1" + in: + components: + - type: LiteralSequenceExpression + sequence: CGC + - type: RepeatedSequenceExpression + seq_expr: + type: LiteralSequenceExpression + sequence: CGA + count: + type: Number + value: 3 + type: ComposedSequenceExpression + out: + ga4gh_serialize: '{"components":[{"sequence":"CGC","type":"LiteralSequenceExpression"},{"count":{"type":"Number","value":3},"seq_expr":{"sequence":"CGA","type":"LiteralSequenceExpression"},"type":"RepeatedSequenceExpression"}],"type":"ComposedSequenceExpression"}' +ComposedSequenceExpression: + - name: "Composed Sequence Expression w/ order 2" + in: + components: + - type: RepeatedSequenceExpression + seq_expr: + type: LiteralSequenceExpression + sequence: CGA + count: + type: Number + value: 3 + - type: LiteralSequenceExpression + sequence: CGC + type: ComposedSequenceExpression + out: + ga4gh_serialize: '{"components":[{"count":{"type":"Number","value":3},"seq_expr":{"sequence":"CGA","type":"LiteralSequenceExpression"},"type":"RepeatedSequenceExpression"},{"sequence":"CGC","type":"LiteralSequenceExpression"}],"type":"ComposedSequenceExpression"}' Allele: - name: "rs7412@GRCh38>T w/SequenceState" in: @@ -213,6 +245,68 @@ Allele: ga4gh_digest: CxiA_hvYbkD8Vqwjhx5AYuyul4mtlkpD ga4gh_identify: ga4gh:VA.CxiA_hvYbkD8Vqwjhx5AYuyul4mtlkpD ga4gh_serialize: '{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","state":{"sequence":"T","type":"LiteralSequenceExpression"},"type":"Allele"}' +Allele: + - name: "Allele w/ Composed Sequence Expression w/ order 1" + in: + location: + interval: + end: + type: Number + value: 44908822 + start: + type: Number + value: 44908821 + type: SequenceInterval + sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl + type: SequenceLocation + state: + components: + - type: LiteralSequenceExpression + sequence: CGC + - type: RepeatedSequenceExpression + seq_expr: + type: LiteralSequenceExpression + sequence: CGA + count: + type: Number + value: 3 + type: ComposedSequenceExpression + type: Allele + out: + ga4gh_digest: obWIAB54mfRE2HAwQiIzKZeIx0REPG-8 + ga4gh_identify: ga4gh:VA.obWIAB54mfRE2HAwQiIzKZeIx0REPG-8 + ga4gh_serialize: '{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","state":{"components":[{"sequence":"CGC","type":"LiteralSequenceExpression"},{"count":{"type":"Number","value":3},"seq_expr":{"sequence":"CGA","type":"LiteralSequenceExpression"},"type":"RepeatedSequenceExpression"}],"type":"ComposedSequenceExpression"},"type":"Allele"}' +Allele: + - name: "Allele w/ Composed Sequence Expression w/ order 2" + in: + location: + interval: + end: + type: Number + value: 44908822 + start: + type: Number + value: 44908821 + type: SequenceInterval + sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl + type: SequenceLocation + state: + components: + - type: RepeatedSequenceExpression + seq_expr: + type: LiteralSequenceExpression + sequence: CGA + count: + type: Number + value: 3 + - type: LiteralSequenceExpression + sequence: CGC + type: ComposedSequenceExpression + type: Allele + out: + ga4gh_digest: KDrbvmR-Y2dccsgckQnpEsQuLMq4p10d + ga4gh_identify: ga4gh:VA.KDrbvmR-Y2dccsgckQnpEsQuLMq4p10d + ga4gh_serialize: '{"location":"QrRSuBj-VScAGV_gEdxNgsnh41jYH1Kg","state":{"components":[{"count":{"type":"Number","value":3},"seq_expr":{"sequence":"CGA","type":"LiteralSequenceExpression"},"type":"RepeatedSequenceExpression"},{"sequence":"CGC","type":"LiteralSequenceExpression"}],"type":"ComposedSequenceExpression"},"type":"Allele"}' Haplotype: - name: "APOE1 on GRCh38, inline" in: @@ -262,7 +356,7 @@ Haplotype: ga4gh_digest: i8owCOBHIlRCPtcw_WzRFNTunwJRy99- ga4gh_identify: ga4gh:VH.i8owCOBHIlRCPtcw_WzRFNTunwJRy99- ga4gh_serialize: '{"members":["-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x","Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"],"type":"Haplotype"}' -CopyNumber: +CopyNumberCount: - name: ">=3 copies APOE" in: copies: @@ -270,13 +364,41 @@ CopyNumber: type: IndefiniteRange value: 3 subject: - gene_id: ncbigene:384 - type: Gene - type: CopyNumber + sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl + interval: + end: + type: Number + value: 44909393 + start: + type: Number + value: 44905795 + type: SequenceInterval + type: SequenceLocation + type: CopyNumberCount + out: + ga4gh_digest: salZa9yW-GduRxsRFwIGCQvi_YfpjeF4 + ga4gh_identify: ga4gh:CN.salZa9yW-GduRxsRFwIGCQvi_YfpjeF4 + ga4gh_serialize: '{"copies":{"comparator":">=","type":"IndefiniteRange","value":3},"subject":"oz3NEuhtbBep3yqu3wrhqfDKbLPK7vcE","type":"CopyNumberCount"}' +CopyNumberChange: + - name: "Low-level copy gain of BRCA1" + in: + copy_change: efo:0030071 + subject: + sequence_id: ga4gh:SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl + interval: + end: + type: Number + value: 44909393 + start: + type: Number + value: 44905795 + type: SequenceInterval + type: SequenceLocation + type: CopyNumberChange out: - ga4gh_digest: xksSWn--_z28Qaj-Udlhot4OKqYGkywy - ga4gh_identify: ga4gh:VCN.xksSWn--_z28Qaj-Udlhot4OKqYGkywy - ga4gh_serialize: '{"copies":{"comparator":">=","type":"IndefiniteRange","value":3},"subject":{"gene_id":"ncbigene:384","type":"Gene"},"type":"CopyNumber"}' + ga4gh_digest: zRqNmX-TVTU5FOFxfR4y0jwBysw7ztPn + ga4gh_identify: ga4gh:CX.zRqNmX-TVTU5FOFxfR4y0jwBysw7ztPn + ga4gh_serialize: '{"copy_change":"efo:0030071","subject":"oz3NEuhtbBep3yqu3wrhqfDKbLPK7vcE","type":"CopyNumberChange"}' Text: - in: @@ -335,3 +457,139 @@ VariationSet: ga4gh_digest: QLQXSNSIFlqNYWmQbw-YkfmexPi4NeDE ga4gh_identify: ga4gh:VS.QLQXSNSIFlqNYWmQbw-YkfmexPi4NeDE ga4gh_serialize: '{"members":["-kUJh47Pu24Y3Wdsk1rXEDKsXWNY-68x","Z_rYRxpUvwqCLsCBO3YLl70o2uf9_Op1"],"type":"VariationSet"}' +GenotypeMember: + - name: "GenotypeMember w/ Allele" + in: + count: + value: 1 + type: Number + variation: + location: + interval: + end: + type: Number + value: 94842866 + start: + type: Number + value: 94842865 + type: SequenceInterval + sequence_id: ga4gh:SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB + type: SequenceLocation + state: + sequence: G + type: LiteralSequenceExpression + type: Allele + type: GenotypeMember + out: + ga4gh_serialize: '{"count":{"type":"Number","value":1},"type":"GenotypeMember","variation":"geQCxa1Enel8UBUAQQ2-rbphDjIR-cq0"}' +GenotypeMember: + - name: "GenotypeMember w/ Haplotype" + in: + count: + value: 1 + type: Number + variation: + members: + - location: + interval: + end: + type: Number + value: 94761900 + start: + type: Number + value: 94761899 + type: SequenceInterval + sequence_id: ga4gh:SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB + type: SequenceLocation + state: + sequence: T + type: LiteralSequenceExpression + type: Allele + - location: + interval: + end: + type: Number + value: 94842866 + start: + type: Number + value: 94842865 + type: SequenceInterval + sequence_id: ga4gh:SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB + type: SequenceLocation + state: + sequence: G + type: LiteralSequenceExpression + type: Allele + type: Haplotype + type: GenotypeMember + out: + ga4gh_serialize: '{"count":{"type":"Number","value":1},"type":"GenotypeMember","variation":"Ow_uE0YaVWHIno4pQfdmYpWmlGPNtXQr"}' +Genotype: + - + in: + members: + - count: + value: 1 + type: Number + variation: + location: + interval: + end: + type: Number + value: 94842866 + start: + type: Number + value: 94842865 + type: SequenceInterval + sequence_id: ga4gh:SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB + type: SequenceLocation + state: + sequence: G + type: LiteralSequenceExpression + type: Allele + type: GenotypeMember + - count: + value: 1 + type: Number + variation: + members: + - location: + interval: + end: + type: Number + value: 94761900 + start: + type: Number + value: 94761899 + type: SequenceInterval + sequence_id: ga4gh:SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB + type: SequenceLocation + state: + sequence: T + type: LiteralSequenceExpression + type: Allele + - location: + interval: + end: + type: Number + value: 94842866 + start: + type: Number + value: 94842865 + type: SequenceInterval + sequence_id: ga4gh:SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB + type: SequenceLocation + state: + sequence: G + type: LiteralSequenceExpression + type: Allele + type: Haplotype + type: GenotypeMember + count: + type: Number + value: 2 + type: Genotype + out: + ga4gh_digest: fz-TMM88G2hmK6cQ-JwrpVAr8d_3eTVq + ga4gh_identify: ga4gh:GT.fz-TMM88G2hmK6cQ-JwrpVAr8d_3eTVq + ga4gh_serialize: '{"count":{"type":"Number","value":2},"members":["EhA9scQ-F-n1eQdQOJYClDXq613IZLQm","oJg9piBqrJ-_t3PSLA21d4z8f4tJHKqI"],"type":"Genotype"}'