Skip to content

Commit

Permalink
Merge pull request #7 from ave-dcd/revisions
Browse files Browse the repository at this point in the history
Revisions
  • Loading branch information
afrubin authored Oct 12, 2023
2 parents b3dbff0 + 7db2ba1 commit 320d95b
Show file tree
Hide file tree
Showing 11 changed files with 1,230 additions and 101 deletions.
3 changes: 2 additions & 1 deletion .requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
pytest
pyyaml
jsonschema
jsonschema
ga4gh.gks.metaschema
339 changes: 339 additions & 0 deletions README.md

Large diffs are not rendered by default.

65 changes: 65 additions & 0 deletions concept_vocabulary.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
Domain Term exactMapping(s) Definition
LibraryDeliveryMethod adeno-associated virus transduction Library delivery using adeno-associated virus transduction
LibraryDeliveryMethod chemical or heat shock transformation Library delivery using chemical or heat shock transformation
LibraryDeliveryMethod chemical-based transfection Library delivery using chemical-based transfection
LibraryDeliveryMethod electroporation Library delivery using electroporation
LibraryDeliveryMethod lentivirus transduction Library delivery using lentivirus transduction
LibraryDeliveryMethod nucleofection Library delivery using nucleofection
LibraryGenerationMechanism base editor A base editor mechanism of CRISPR/Cas mediated variant library generation
LibraryGenerationMechanism prime editor A prime editor mechanism of CRISPR/Cas mediated variant library generation
LibraryGenerationMechanism nuclease A nuclease mechanism of CRISPR/Cas mediated variant library generation
LibraryGenerationSystem AsCas12a CRISPR/Cas mediated variant library generation by the AsCas12a system
LibraryGenerationSystem doped oligo synthesis Doped oligo synthesis mediated variant library generation
LibraryGenerationSystem error-prone PCR Error-prone Polymerase Chain Reaction (PCR) mediated variant library generation
LibraryGenerationSystem microarray synthesis Microarray synthesis mediated variant library generation
LibraryGenerationSystem nicking mutagenesis Nicking mutagenesis mediated variant library generation
LibraryGenerationSystem oligo pool synthesis Oligo pool synthesis mediated variant library generation
LibraryGenerationSystem oligo-directed mutagenic PCR Oligo-directed mutagenic Polymerase Chain Reaction (PCR) mediated variant library generation
LibraryGenerationSystem proprietary method A proprietary method for variant library generation
LibraryGenerationSystem RfsCas13d CRISPR/Cas mediated variant library generation by the RfsCas13d system
LibraryGenerationSystem SaCas9 CRISPR/Cas mediated variant library generation by the SaCas9 system
LibraryGenerationSystem site-directed mutagenesis Site-directed mutagenesis mediated variant library generation
LibraryGenerationSystem SpCas9 CRISPR/Cas mediated variant library generation by the SpCas9 system
LibraryIntegrationMechanism episomal delivery Library expression by episomal delivery
LibraryIntegrationMechanism extra-local construct insertion Library integration at a designated integration site, e.g. with Landing Pad
LibraryIntegrationMechanism native locus replacement Entire element replacement at the native locus (e.g. with integrases)
LibraryIntegrationMechanism plasmid (not integrated) Expression of gene products from a non-integrating plasmid
LibraryIntegrationMechanism random locus viral integration Intergration of a virus into a random locus
LibraryIntegrationMechanism transfection of RNA Direct transfection of RNA
PhenotypicAssayDimensionality high-dimensional data Assay with inherent high-dimensional data
PhenotypicAssayDimensionality combined functional data Assay with multiple, combined functional readouts
PhenotypicAssayDimensionality single-dimensional data Assay with a single-dimensional readout
PhenotypicAssayMethod binding assay OBI:0001146 Phenotypic assay measuring binding (e.g. between two proteins)
PhenotypicAssayMethod bulk RNA-sequencing OBI:0003090 Phenotypic assay using bulk RNA-sequencing
PhenotypicAssayMethod cell proliferation assay OBI:0000891 Phenotypic assay measuring cell proliferation
PhenotypicAssayMethod systematic evolution of ligands by exponential enrichment assay OBI:0002161 Phenotypic assay measuring evolution of ligands by exponential enrichment
PhenotypicAssayMethod flow cytometry assay OBI:0000916 Phenotypic assay measuring fluorescence by flow cytometry
PhenotypicAssayMethod fluorescence in-situ hybridization (FISH) assay OBI:0003094 Phenotypic assay using fluorescence in-situ hybridization (FISH)
PhenotypicAssayMethod imaging mass cytometry assay OBI:0003096 Phenotypic assay using imaging mass cytometry
PhenotypicAssayMethod multiplexed fluorescent antibody imaging OBI:0003091 Phenotypic assay using multiplexed fluorescent antibody imaging
PhenotypicAssayMethod promoter activity detection by reporter gene assay OBI:0000913 Phenotypic assay measuring promoter activity using a reporter gene
PhenotypicAssayMethod single-cell imaging Phenotypic assay using single cell imaging
PhenotypicAssayMethod single-cell RNA sequencing assay OBI:0002631 Phenotypic assay using single-cell RNA sequencing
PhenotypicAssayMethod survival assessment assay OBI:0000699 Phenotypic assay using a survival assessment assay
PhenotypicAssayModelSystem bacteria Model system of bacteria (E. coli)
PhenotypicAssayModelSystem bacteriophage Model system of bacteriophage
PhenotypicAssayModelSystem immortalized human cells Model system of immortalized human cells (H. sapiens)
PhenotypicAssayModelSystem induced pluripotent stem cells from human female Model system of induced pluripotent stem cells from human female
PhenotypicAssayModelSystem induced pluripotent stem cells from human male Model system of induced pluripotent stem cells from human male
PhenotypicAssayModelSystem molecular display Model system of molecular display
PhenotypicAssayModelSystem murine primary cells Model system of mouse primary cells (M. musculus)
PhenotypicAssayModelSystem patient derived primary cells (e.g. T-cells, adipocytes) Model system of patient derived primary cells (e.g. T-cells, adipocytes)
PhenotypicAssayModelSystem yeast Model system of yeast (S. cerevisiae)
PhenotypicAssayProfilingStrategy barcode sequencing Library profiling strategy of sequencing of a barcode associated with the variant library
PhenotypicAssayProfilingStrategy direct sequencing Library profiling strategy of direct sequencing of the target variant library
PhenotypicAssayProfilingStrategy shotgun sequencing Library profiling strategy of shotgun sequencing
PhenotypicAssaySequencingMethod multi-segment Library sequencing method of sequencing of multiple segments using short or long reads
PhenotypicAssaySequencingMethod single-segment (long read) Library sequencing method of sequencing of a single segment using long reads (e.g. Oxford Nanopore or PacBio)
PhenotypicAssaySequencingMethod single-segment (short read) Library sequencing method of sequencing of a single segment using short reads (e.g. Illumina)
VariantLibrary base editor functionality A base editor mechanism of a CRISPR/Cas variant library generation method
VariantLibrary prime editor functionality A prime editor mechanism of a CRISPR/Cas variant library generation method
VariantLibrary wildtype nuclease functionality A wildtype nuclease mechanism of a CRISPR/Cas variant library generation method
VariantLibraryScope coding The protein-coding sequence of a gene
VariantLibraryScope intronic Intronic sequence in between exons of a gene
VariantLibraryScope non-coding, other Non-coding sequence corresponding to non-regulatory elements
VariantLibraryScope non-coding, regulatory Non-coding sequence corresponding to regulatory elements (e.g. enhancers or promoters)
77 changes: 77 additions & 0 deletions examples/Findlay_2018.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
title: BRCA1 Saturation Genome Editing
abstract: >-
Variants of uncertain significance fundamentally limit the clinical utility of genetic information. The challenge
they pose is epitomized by BRCA1, a tumour suppressor gene in which germline loss-of-function variants predispose
women to breast and ovarian cancer. Although BRCA1 has been sequenced in millions of women, the risk associated with
most newly observed variants cannot be definitively assigned. Here we use saturation genome editing to assay 96.5% of
all possible single-nucleotide variants (SNVs) in 13 exons that encode functionally critical domains of BRCA1.
Functional effects for nearly 4,000 SNVs are bimodally distributed and almost perfectly concordant with established
assessments of pathogenicity. Over 400 non-functional missense SNVs are identified, as well as around 300 SNVs that
disrupt expression. We predict that these results will be immediately useful for the clinical interpretation of BRCA1
variants, and that this approach can be extended to overcome the challenge of variants of uncertain significance in
additional clinically actionable genes.
document:
title: Accurate classification of BRCA1 variants with saturation genome editing
system:
Nature
date: "2018-09-12"
ref: https://doi.org/10.1038/s41586-018-0461-z
datasets:
- system: MaveDB
accession: urn:mavedb:00000097
ref: https://mavedb.org/#/experiment-sets/urn:mavedb:00000097
description: processed scores, including scores for each replicate of each exon
- system: GEO
accession: GSE117159
ref: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117159
description: raw sequencing data
- system: website
accession: https://sge.gs.washington.edu/BRCA1/
ref: https://sge.gs.washington.edu/BRCA1/
description: processed scores and visualizations hosted by the investigators
variantLibrary:
scope:
type: coding
targetSequences:
- id: NM_007294.3
sequenceAlphabet: DNA
generationMethod:
type: endogenous locus library
system: SpCas9
mechanism: nuclease
description: Array-synthesized oligo pools (Agilent)
deliveryMethod:
type: other
description: Lipofection - TurboFectin
phenotypicAssay:
dimensionality:
type: combined functional data
replication:
type: biological
description: two biological replicates were performed
method:
type: survival assessment assay
method:
type: bulk RNA-sequencing
relevance:
- system: https://www.omim.org/
code: "604370"
label: BREAST-OVARIAN CANCER, FAMILIAL, SUSCEPTIBILITY TO, 1; BROVCA1
- system: https://www.omim.org/
code: "113705"
label: BRCA1 DNA REPAIR-ASSOCIATED PROTEIN; BRCA1
- system: https://mondo.monarchinitiative.org/
code: MONDO:0004984
label: basal-like breast carcinoma
- system: https://mondo.monarchinitiative.org/
code: MONDO:0011450
label: breast-ovarian cancer, familial, susceptibility to, 1
modelSystem:
type: immortalized human cells
description: HAP1
codings:
- system: https://www.ncbi.nlm.nih.gov/taxonomy
code: NCBI:txid9606
label: Homo sapiens
profilingStrategy: direct sequencing
sequencingReadType: multi-segment
100 changes: 100 additions & 0 deletions examples/Matreyek_2018.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
title: PTEN VAMP-seq
abstract: >-
Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the
only option. Experimentally characterizing millions of possible missense variants in thousands of clinically
important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel
sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular
abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and
TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants
that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe
selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN
missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is
applicable to other genes, highlighting its generalizability.
document:
title: >-
Multiplex assessment of protein variant abundance by massively parallel
sequencing
system:
Nature Genetics
date: "2018-05-21"
ref: https://doi.org/10.1038/s41588-018-0122-z
datasets:
- system: MaveDB
accession: urn:mavedb:00000013-a
ref: https://mavedb.org/#/experiments/urn:mavedb:00000013-a
description: processed scores, including scores for each replicate experiment
- system: BioProject
accession: PRJNA428380
ref: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA428380
description: raw sequencing data
variantLibrary:
scope:
type: coding
targetSequences:
- sequence: "ATGACAGCCATCATCAAAGAGATCGTTAGCAGAAACAAAAGGAGATATCAAGAGGATGGA\
TTCGACTTAGACTTGACCTATATTTATCCAAACATTATTGCTATGGGATTTCCTGCAGAA\
AGACTTGAAGGCGTATACAGGAACAATATTGATGATGTAGTAAGGTTTTTGGATTCAAAG\
CATAAAAACCATTACAAGATATACAATCTTTGTGCTGAAAGACATTATGACACCGCCAAA\
TTTAATTGCAGAGTTGCACAATATCCTTTTGAAGACCATAACCCACCACAGCTAGAACTT\
ATCAAACCCTTTTGTGAAGATCTTGACCAATGGCTAAGTGAAGATGACAATCATGTTGCA\
GCAATTCACTGTAAAGCTGGAAAGGGACGAACTGGTGTAATGATATGTGCATATTTATTA\
CATCGGGGCAAATTTTTAAAGGCACAAGAGGCCCTAGATTTCTATGGGGAAGTAAGGACC\
AGAGACAAAAAGGGAGTAACTATTCCCAGTCAGAGGCGCTATGTGTATTATTATAGCTAC\
CTGTTAAAGAATCATCTGGATTATAGACCAGTGGCACTGTTGTTTCACAAGATGATGTTT\
GAAACTATTCCAATGTTCAGTGGCGGAACTTGCAATCCTCAGTTTGTGGTCTGCCAGCTA\
AAGGTGAAGATATATTCCTCCAATTCAGGACCCACACGACGGGAAGACAAGTTCATGTAC\
TTTGAGTTCCCTCAGCCGTTACCTGTGTGTGGTGATATCAAAGTAGAGTTCTTCCACAAA\
CAGAACAAGATGCTAAAAAAGGACAAAATGTTTCACTTTTGGGTAAATACATTCTTCATA\
CCAGGACCAGAGGAAACCTCAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATC\
GATAGCATTTGCAGTATAGAGCGTGCAGATAATGACAAGGAATATCTAGTACTTACTTTA\
ACAAAAAATGATCTTGACAAAGCAAATAAAGACAAAGCCAACCGATACTTTTCTCCAAAT\
TTTAAGGTGAAGCTGTACTTCACAAAAACAGTAGAGGAGCCGTCAAATCCAGAGGCTAGC\
AGTTCAACTTCTGTAACACCAGATGTTAGTGACAATGAACCTGATCATTATAGATATTCT\
GACACCACTGACTCTGATCCAGAGAATGAACCTTTTGATGAAGATCAGCATACACAAATT\
ACAAAAGTCTGA"
sequenceAlphabet: DNA
generationMethod:
type: in-vitro construct library
system: oligo-directed mutagenic PCR
integration: extra-local construct insertion
description: Integration using Tet-on landing pad system
deliveryMethod:
type: chemical or heat shock transformation
phenotypicAssay:
dimensionality:
type: single-dimensional data
replication:
type: biological and technical
description: 8 biological replicate experiments were performed from three
different transfections (4, 3, and 1 experimental replicate for these
transfections). Technical replicates were performed as part of QC, but
the technical replicates were collapsed and analyzed as one experiment
after passing.
method:
type: flow cytometry assay
description: VAMP-seq
relevance:
- system: https://www.omim.org/
code: "601728"
label: PHOSPHATASE AND TENSIN HOMOLOG; PTEN
- system: https://www.omim.org/
code: "158350"
label: COWDEN SYNDROME 1; CWS1
- system: https://mondo.monarchinitiative.org/
code: MONDO:0017623
label: PTEN hamartoma tumor syndrome
- system: https://mondo.monarchinitiative.org/
code: MONDO:0017623
label: Cowden syndrome 1
modelSystem:
type: immortalized human cells
description: HEK 293T TetBxb1BFP
codings:
- system: https://www.ebi.ac.uk/ols/ontologies/clo
code: CLO:0037372
label: HEK293T cell
- system: https://www.ncbi.nlm.nih.gov/taxonomy
code: NCBI:txid9606
label: Homo sapiens
profilingStrategy: barcode sequencing
sequencingReadType: single-segment (short read)
70 changes: 70 additions & 0 deletions examples/Seuma_2022.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
title: Amyloid-Beta Deep Mutational Scan
abstract: >-
Multiplexed assays of variant effects (MAVEs) guide clinical variant interpretation and reveal disease mechanisms. To
date, MAVEs have focussed on a single mutation type–amino acid (AA) substitutions–despite the diversity of coding
variants that cause disease. Here we use Deep Indel Mutagenesis (DIM) to generate a comprehensive atlas of diverse
variant effects for a disease protein, the amyloid beta (Aβ) peptide that aggregates in Alzheimer's disease (AD) and
is mutated in familial AD (fAD). The atlas identifies known fAD mutations and reveals that many variants beyond
substitutions accelerate Aβ aggregation and are likely to be pathogenic. Truncations, substitutions, insertions,
single- and internal multi-AA deletions differ in their propensity to enhance or impair aggregation, but likely
pathogenic variants from all classes are highly enriched in the polar N-terminal region of Aβ. This comparative atlas
highlights the importance of including diverse mutation types in MAVEs and provides important mechanistic insights
into amyloid nucleation.
document:
title: >-
An atlas of amyloid aggregation: the impact of substitutions, insertions, deletions and truncations on amyloid beta
fibril nucleation.
system:
Nature Communications
date: "2022-11-18"
ref: https://doi.org/10.1038/s41467-022-34742-3
datasets:
- system: MaveDB
accession: urn:mavedb:00000113-a
ref: https://mavedb.org/#/experiments/urn:mavedb:00000113-a
description: processed scores
variantLibrary:
scope:
type: coding
targetSequences:
- id: ga4gh:SQ.upmBKNxvwSQPi9n6JMSkWimiVyhutErS
sha512t24u: upmBKNxvwSQPi9n6JMSkWimiVyhutErS
sequence: DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA
sequenceAlphabet: protein
generationMethod:
type: in-vitro construct library
system: oligo pool synthesis
integration: plasmid (not integrated)
description: Oligo pool synthesis, Twist Bioscience
deliveryMethod:
type: chemical or heat shock transformation
phenotypicAssay:
dimensionality:
type: single-dimensional data
replication:
type: biological and technical
description: Three biological replicates (transformations) were performed and five technical replicate selections
were done for each. Sequencing was performed by combining six equimolar samples of each technical replicate.
method:
type: survival assessment assay
description: Survival assessment assay (growth in -adenine)
relevance:
- system: https://www.omim.org/
code: "104300"
label: ALZHEIMER DISEASE, FAMILIAL, 1; AD1
- system: https://www.omim.org/
code: "104760"
label: AMYLOID BETA A4 PRECURSOR PROTEIN; APP
- system: https://mondo.monarchinitiative.org/
code: MONDO:0004975
label: Alzheimer’s Disease
modelSystem:
type: yeast
description: Saccharomyces cerevisiae [psi-pin-]
(MATa ade1-14 his3 leu2-3,112 lys2 trp1 ura3-52)
codings:
- system: https://www.ncbi.nlm.nih.gov/taxonomy
code: NCBI:txid4932
label: Saccharomyces cerevisiae
profilingStrategy: direct sequencing
sequencingReadType: single-segment (short read)
Loading

0 comments on commit 320d95b

Please sign in to comment.