The FAIR Genomes semantic metadata schema to power reuse of NGS data in research and healthcare. Version 1.3-SNAPSHOT, 2022-02-28. This model consists of 9 modules that contain 112 metadata elements and 85367 lookups in total (excluding null flavors).
Name | Description | Ontology | Nr. of elements |
---|---|---|---|
Study | A detailed examination, analysis, or critical inspection of one or multiple subjects designed to discover facts. | NCIT:C63536 | 9 |
Personal | Data, facts or figures about an individual; the set of relevant items would depend on the use case. | NCIT:C90492 | 14 |
Leaflet and consent form | A document explaining all the relevant information to assist an individual in understanding the expectations and risks in making a decision about a procedure. This document is presented to and signed by the individual or guardian. | NCIT:C16468 | 9 |
Individual consent | Consent given by a patient to a surgical or medical procedure or participation in a study, examination or analysis after achieving an understanding of the relevant medical facts and the risks involved. | NCIT:C16735 | 12 |
Clinical | Findings and circumstances relating to the examination and treatment of a patient. | NCIT:C25398 | 19 |
Material | A natural substance derived from living organisms such as cells, tissues, proteins, and DNA. | NCIT:C43376 | 17 |
Sample preparation | A sample preparation for a nucleic acids sequencing assay. | OBI:0001902 | 9 |
Sequencing | The determination of complete (typically nucleotide) sequences, including those of genomes (full genome sequencing, de novo sequencing and resequencing), amplicons and transcriptomes. | EDAM:topic_3168 | 12 |
Analysis | An analysis applies analytical (often computational) methods to existing data of a specific type to produce some desired output. | EDAM:operation_2945 | 11 |
A detailed examination, analysis, or critical inspection of one or multiple subjects designed to discover facts. Ontology: NCIT:C63536.
Element | Description | Ontology | Values |
---|---|---|---|
Identifier | A unique proper name or character sequence that identifies this particular study. | OMIABIS:0000006 | UniqueID |
Name | A name that designates this study. | OMIABIS:0000037 | String |
Description | A statement or piece of writing that provides details on this study. | OMIABIS:0000036 | Text |
Inclusion criteria | The conditions which, if met, make an person eligible for participation in this study. | OBI:0500027 | InclusionCriteria lookup (14 choices of type) |
Principal investigator | The principal investigator or responsible person for this study. | OMIABIS:0000100 | String |
Contact information | An email address for the purpose of contacting the study contact person. | OMIABIS:0000035 | String |
Study design | A plan specification comprised of protocols (which may specify how and what kinds of data will be gathered) that are executed as part of this study. | OBI:0500000 | Text |
Start date | The date on which this study began. | NCIT:C69208 | Date |
Completion date | The date on which the concluding information for this study is completed. Usually, this is when the last subject has a final visit, or the main analysis has finished, or any other protocol-defined completion date. | NCIT:C142702 | Date |
Data, facts or figures about an individual; the set of relevant items would depend on the use case. Ontology: NCIT:C90492.
Element | Description | Ontology | Values |
---|---|---|---|
Personal identifier | A unique proper name or character sequence that identifies this particular person. | NCIT:C164337 | UniqueID |
Gender identity | A person's concept of self as being male and masculine or female and feminine, or ambivalent, based in part on physical characteristics, parental responses, and psychological and social pressures. It is the internal experience of gender role. For practical reasons the lookups are limited to first and second-level entries, but can be expanded when needed. Note that 'Gender at birth', 'Genotypic sex' and any (gender-related) hormone therapies in 'Medication' are usually medically more relevant than this term. | MESH:D005783 | GenderIdentity lookup (15 choices of type) |
Gender at birth | Assigned gender is one's gender which was assigned at birth, typically by a medical and/or legal organization, and then later registered with other organizations. Such a designation is typically based off of the superficial appearance of external genitalia present at birth. | GSSO:009418 | GenderAtBirth lookup (13 choices of type) |
Genotypic sex | A biological sex quality inhering in an individual based upon genotypic composition of sex chromosomes. | PATO:0020000 | GenotypicSex lookup (12 choices of type) |
Country of residence | Country of residence at enrollment. | NCIT:C171105 | Countries lookup (249 choices of type) |
Ancestry | Population category defined using ancestry informative markers (AIMs) based on genetic/genomic data. | NCIT:C176763 | Ancestry lookup (305 choices of type) |
Country of birth | The country that this person was born in. | GENEPIO:0001094 | Countries lookup (249 choices of type) |
Year of birth | The year in which this person was born. | NCIT:C83164 | Integer |
Inclusion status | An indicator that provides information on the current health status of this person. | NCIT:C166244 | InclusionStatus lookup (4 choices of type) |
Age at death | The age at which death occurred. | NCIT:C135383 | Integer |
Consanguinity | Information on whether the patient is a child from two family members who are second cousins or closer. | OMIT:0004546 | Boolean |
Primary affiliated institute | The most significant institute for medical consultation and/or study inclusion in context of the genetic disease of this person. | NCIT:C25412 | Institutes lookup (219 choices of type) |
Resources in other institutes | Material or data related to this person that is not captured by this system though known to be available in other institutes such as biobanks or hospitals. | NCIT:C19012 | Institutes lookup (219 choices of type) |
Participates in study | Reference to the study or studies in which this person participates. | RO:0000056 | Reference to instances of Study |
A document explaining all the relevant information to assist an individual in understanding the expectations and risks in making a decision about a procedure. This document is presented to and signed by the individual or guardian. Ontology: NCIT:C16468.
Element | Description | Ontology | Values |
---|---|---|---|
Leaflet title | A title or name given to the leaflet that belongs to this consent form. | DC:title | String |
Leaflet date | A point or period of time associated with the publication of this leaflet that belongs to this consent form. | DC:date | Date |
Leaflet version | The version, edition, or adaptation of this leaflet that belongs to this consent form. | DC:hasVersion | String |
Consent form identifier | A unique proper name or character sequence that identifies this particular leaflet and consent form combination used in signing individual consent. Using a DOI would be optimal. Using any resolvable URL is suboptimal but still preferable over using a plain text value. | DC:identifier | UniqueID |
Consent form title | A title or name given to this consent form. | DC:title | String |
Consent form accepted date | Date of acceptance of this consent form. | DC:dateAccepted | Date |
Consent form valid until | End date of the validity of this consent form. | DC:valid | Date |
Consent form creator | Indicates the authoritative body who brought this consent form into existence. | DC:creator | Institutes lookup (219 choices of type) |
Consent form version | The version, edition, or adaptation of this consent form. | DC:hasVersion | String |
Consent given by a patient to a surgical or medical procedure or participation in a study, examination or analysis after achieving an understanding of the relevant medical facts and the risks involved. Ontology: NCIT:C16735.
Element | Description | Ontology | Values |
---|---|---|---|
Individual consent identifier | A unique proper name or character sequence that identifies this particular signed individual consent. | ICO:0000044 | UniqueID |
Person consenting | Reference to the person (i.e. subject) to whom this individual consent applies. | IAO:0000136 | Reference to instances of Personal |
Consent form used | Reference to the informed consent form that was signed. Points to a particular instance of leaflet and consent form that usually exists as a record (i.e. a row) within the same database as this individual consent. | IAO:0000136 | Reference to instances of Leaflet and consent form |
Collected by | Indicates the institute who performed the collection act. | NCIT:C45262 | Institutes lookup (219 choices of type) |
Signing date | A date specification that designates when this individual consent form was signed. | ICO:0000036 | Date |
Valid from | Starting date of the validity of this individual consent. | DC:valid | Date |
Valid until | End date of the validity of this individual consent. | DC:valid | Date |
Represented by | An individual who is authorized under applicable State or local law to consent on behalf of a child or incapable person to general medical care including participation in clinical research. | NCIT:C142600 | RepresentedBy lookup (3 choices of type) |
Data use permissions | A data item that is used to indicate consent permissions for datasets and/or materials, and relates to the purposes for which datasets and/or material might be used. | DUO:0000001 | DataUsePermissions lookup (5 choices of type) |
Data use modifiers | Data use modifiers indicate additional conditions for use. For instance, a dataset is restricted to investigations into specific diseases or performed at specific geographical locations. | DUO:0000017 | DataUseModifiers lookup (23 choices of type) |
Data use specification | Further specification of applied data use permissions and modifiers. For example, a list of countries in case of geographic restrictions or a list of diseases when restricted to disease-specific research. | SIO:000090 | Text |
Allow recontacting | The procedure of recontacting the patient for specified reasons. This means the patient agrees to be re-identifiable under those circumstances. | NCIT:C25737 | Recontacting lookup (3 choices of type) |
Findings and circumstances relating to the examination and treatment of a patient. Ontology: NCIT:C25398.
Element | Description | Ontology | Values |
---|---|---|---|
Clinical identifier | A unique proper name or character sequence that identifies this particular clinical examination. | NCIT:C87853 | UniqueID |
Belongs to person | Reference to the person whom this clinical information is about. | IAO:0000136 | Reference to instances of Personal |
Phenotype | The outward appearance of the individual. In medical context, these are often the symptoms caused by a disease. | NCIT:C16977 | Phenotypes lookup (15802 choices of type) |
Unobserved phenotype | Phenotypes or symptoms that were looked for but not observed, which may help in differential diagnosis or establish incomplete penetrance. | HL7:C0442737 | Phenotypes lookup (15802 choices of type) |
Phenotypic data available | Types of phenotypic data collected in a clinical setting that is potentially available upon request. | NCIT:C15783 | DCMITypes lookup (6 choices of type) |
Clinical diagnosis | A diagnosis made from a study of the signs and symptoms of a disease. | NCIT:C15607 | Diseases lookup (9700 choices of type) |
Molecular diagnosis gene | Gene affected by pathogenic variation that is causal for disease of the patient. | NCIT:C20826 | Genes lookup (19202 choices of type) |
Molecular diagnosis other | Causal variant in HGVS notation with optional classification or free text explaining any other molecular mechanisms involved. | NCIT:C20826 | Text |
Age at diagnosis | The age, measured from some defined time point (e.g. birth) at which a patient is diagnosed with a disease. | SNOMEDCT:423493009 | Integer |
Age at last screening | Age of the patient at the moment of the most recent screening. | NCIT:C81258 | Integer |
Medication | A drug product that contains one or more active and/or inactive ingredients used by the patient intended to treat, prevent or alleviate the symptoms of disease. Any hormone therapies, gender-related or otherwise, should also be recorded here. | NCIT:C459 | Drugs lookup (5632 choices of type) |
Drug regimen | The specific way a therapeutic drug is to be taken, including formulation, route of administration, dose, dosing interval, and treatment duration. | NCIT:C142516 | Text |
Family members affected | Family members related by descent rather than by marriage or law who were diagnosed with the same condition as the individual who is the primary focus of investigation (i.e. the proband). | HP:0032320 | FamilyMembers lookup (41 choices of type) |
Family members sequenced | Family members related by descent rather than by marriage or law who were also tested by next-generation sequencing. | NCIT:C79916 | FamilyMembers lookup (41 choices of type) |
Medical history | A record of a person's background regarding health, occurrence of disease events and surgical procedures. | NCIT:C18772 | MedicalHistory lookup (1154 choices of type) |
Age of onset | Age of onset of clinical manifestations related to the disease of the patient. | Orphanet:C023 | Integer |
First contact | First contact of the patient with a specialised center in context of disease or study inclusion. | LOINC:MTHU048806 | Date |
Functioning | Patient's classification of functioning i.e. disability profile according to International Classification of Functioning and Disability (ICF). | NCIT:C21007 | Text |
Material used in diagnosis | This diagnosis c.q. clinical examination is based on one or more sampled materials. | SIO:000641 | String |
A natural substance derived from living organisms such as cells, tissues, proteins, and DNA. Ontology: NCIT:C43376.
Element | Description | Ontology | Values |
---|---|---|---|
Material identifier | A unique proper name or character sequence that identifies this particular material. | NCIT:C93400 | UniqueID |
Collected from person | Reference to the person from whom this material was collected. | SIO:000244 | Reference to instances of Personal |
Belongs to diagnosis | Reference to a diagnosis c.q. clinical examination of which this material may be a part of. There can be multiple diagnoses when a non-tumor material is reused as reference. | SIO:000068 | Reference to instances of Clinical |
Sampling timestamp | Date and time at which this material was collected. | EFO:0000689 | DateTime |
Registration timestamp | Date and time at which this material was listed or recorded officially, i.e. officially qualified or enrolled. | NCIT:C25646 | DateTime |
Sampling protocol | The procedure whereby this material was sampled for an analysis. | EFO:0005518 | Text |
Sampling protocol deviation | A variation from processes or procedures defined in the sampling protocol. Deviations usually do not preclude the overall evaluability of subject data for either efficacy or safety, and are often acknowledged and accepted in advance by the sponsor. | NCIT:C50996 | String |
Reason for sampling protocol deviation | The rationale for why a deviation from the sampling protocol has occurred. | NCIT:C93529 | String |
Biospecimen type | The type of material taken from a biological entity for testing, diagnostic, propagation, treatment or research purposes. | NCIT:C70713 | BiospecimenTypes lookup (403 choices of type) |
Anatomical source | Biological entity that constitutes the structural organization of an individual member of a biological species from which this material was taken. | NCIT:C103264 | AnatomicalSources lookup (13827 choices of type) |
Pathological state | The pathological state of the tissue from which this material was derived. | NCIT:C28257 | PathologicalState lookup (4 choices of type) |
Storage conditions | The conditions under which this biological material was stored. | NCIT:C96145 | StorageConditions lookup (26 choices of type) |
Expiration date | The date beyond which this material is no longer regarded as fit for use. | NCIT:C164516 | Date |
Percentage tumor cells | The percentage of tumor cells compared to total cells present in this material. | NCIT:C127771 | Decimal |
Physical location | A place on the Earth where this material is located, by its name or by its geographical location. This definition is intentionally vague to allow reuse locally (e.g. which freezer), for contacting (e.g. which institute), broadly for logistical or legal reasons (e.g. city, country or continent). | GAZ:00000448 | String |
Analyses performed | Reports the existence of any analyses performed on this material other than genomics (e.g. transcriptomics, metabolomics, proteomics). | IAO:0000702 | AnalysesPerformed lookup (20 choices of type) |
Derived from | Indicate if this material was produced from or related to another. | NCIT:C28355 | String |
A sample preparation for a nucleic acids sequencing assay. Ontology: OBI:0001902.
Element | Description | Ontology | Values |
---|---|---|---|
Sampleprep identifier | A unique proper name or character sequence that identifies this particular sample preparation. | NCIT:C132299 | UniqueID |
Belongs to material | Reference to the source material from which this sample was prepared. | NCIT:C25683 | Reference to instances of Material |
Input amount | Amount of input material in nanogram (ng). | AFRL:0000010 | Integer |
Library preparation kit | Pre-filled, ready-to-use reagent cartridges intented to improve chemistry, cluster density and read length as well as improve quality (Q) scores for this sample. Reagent components are encoded to interact with the sequencing system to validate compatibility with user-defined applications. | GENEPIO:0000085 | NGSKits lookup (619 choices of type) |
PCR free | Indicates whether a polymerase chain reaction (PCR) was used to prepare this sample. PCR is a method for amplifying a DNA base sequence using multiple rounds of heat denaturation of the DNA and annealing of oligonucleotide primers complementary to flanking regions in the presence of a heat-stable polymerase. | NCIT:C17003 | Boolean |
Target enrichment kit | Indicates which target enrichment kit was used to prepare this sample. Target enrichment is a pre-sequencing DNA preparation step where DNA sequences are either directly amplified (amplicon or multiplex PCR-based) or captured (hybrid capture-based) in order to only focus on specific regions of a genome or DNA sample. | NCIT:C154307 | NGSKits lookup (619 choices of type) |
UMIs present | Indicates whether any unique molecular identifiers (UMIs) are present. An UMI barcode is a short nucleotide sequence that is used to identify reads originating from an individual mRNA molecule. | EFO:0010199 | Boolean |
Intended insert size | In paired-end sequencing, the DNA between the adapter sequences is the insert. The length of this sequence is known as the insert size, not to be confused with the inner distance between reads. So, fragment length equals read adapter length (2x) plus insert size, and insert size equals read lenght (2x) plus inner distance. | FG:0000001 | Integer |
Intended read length | The number of nucleotides intended to be ordered from each side of a nucleic acid fragment obtained after the completion of a sequencing process. | NCIT:C153362 | Integer |
The determination of complete (typically nucleotide) sequences, including those of genomes (full genome sequencing, de novo sequencing and resequencing), amplicons and transcriptomes. Ontology: EDAM:topic_3168.
Element | Description | Ontology | Values |
---|---|---|---|
Sequencing identifier | A unique proper name or character sequence that identifies this particular nucleic acid sequencing assay. | NCIT:C171337 | UniqueID |
Belongs to sample preparation | Reference to the prepared sample, i.e. the source that was sequenced. | NCIT:C25683 | Reference to instances of Sample preparation |
Sequencing date | Date on which this sequencing assay was performed. | GENEPIO:0000069 | Date |
Sequencing platform | The used sequencing platform (i.e. brand, name of a company that produces sequencer equipment). | GENEPIO:0000071 | SequencingPlatform lookup (7 choices of type) |
Sequencing instrument model | The used product name and model number of a manufacturer's genomic (dna) sequencer. | GENEPIO:0001921 | SequencingInstrumentModels lookup (45 choices of type) |
Sequencing method | Method used to determine the order of bases in a nucleic acid sequence. | FIX:0000704 | SequencingMethods lookup (35 choices of type) |
Median read depth | The median number of times a particular locus (site, nucleotide, amplicon, region) was sequenced. | NCIT:C155320 | Integer |
Observed read length | The number of nucleotides successfully ordered from each side of a nucleic acid fragment obtained after the completion of a sequencing process. | NCIT:C153362 | Integer |
Observed insert size | In paired-end sequencing, the DNA between the adapter sequences is the insert. The length of this sequence is known as the insert size, not to be confused with the inner distance between reads. So, fragment length equals read adapter length (2x) plus insert size, and insert size equals read lenght (2x) plus inner distance. | FG:0000002 | Integer |
Percentage Q30 | Percentage of reads with a Phred quality score over 30, which indicates less than a 1/1000 chance that the base was called incorrectly. | GENEPIO:0000089 | Decimal |
Percentage TR20 | Percentage of the target sequence on which 20 or more unique reads were successfully mapped. | FG:0000003 | Decimal |
Other quality metrics | Other NGS quality control metrics, including but not limited to (i) sequencer metrics such as yield, error rate, density (K/mm2), cluster PF (%) and phas/prephas (%), (ii) alignment metrics such as QM insert size, GC content, QM duplicated reads (%), QM error rate, uniformity/evenness of coverage and maternal cell contamination, and (iii) variant call metrics such as number of SNVs/CNVs/SVs called, number of missense/nonsense variants, common variants (%), unique variants (%), gender match and trio inheritance check. | EDAM:data_3914 | Text |
An analysis applies analytical (often computational) methods to existing data of a specific type to produce some desired output. Ontology: EDAM:operation_2945.
Element | Description | Ontology | Values |
---|---|---|---|
Analysis identifier | A unique proper name or character sequence that identifies this particular analysis. | AFR:0001979 | UniqueID |
Belongs to sequencing | Reference to the sequencing that was performed, i.e. the source on which this analysis was based. | NCIT:C25683 | Reference to instances of Sequencing |
Physical data location | A place on the Earth where the data is located, by its name or by its geographical location. This definition is intentionally vague to allow reuse locally (e.g. which computer), for contacting (e.g. which institute), broadly for logistical or legal reasons (e.g. city, country or continent). | GAZ:00000448 | String |
Abstract data location | The file location of the data, or a copy of the data, on an electronically accessible device for preservation (either in plain-text or encrypted format). | NCIT:C142494 | String |
Data formats stored | Which data file formats (i.e. defined ways or layouts of representing and structuring data in a computer file, blob, string, message, or elsewhere) are stored and potentially available. | NCIT:C142494 | DataFormats lookup (582 choices of type) |
Algorithms used | Any used problem-solving procedures implemented in software to be executed by a computer. | NCIT:C16275 | Text |
Reference genome used | The specific build of the human genome used as reference for sequence alignment and variant calling. | EDAM:data_2340 | GenomeAccessions lookup (29 choices of type) |
Bioinformatic protocol used | A human-readable collection of information about about how a scientific experiment or analysis was carried out that results in a specific set of data or results used for further analysis or to test a specific hypothesis. | EDAM:data_2531 | Text |
Bioinformatic protocol deviation | A variation from processes or procedures defined in the bioinformatic protocol. Deviations usually do not preclude the overall evaluability of subject data for either efficacy or safety, and are often acknowledged and accepted in advance by the sponsor. | NCIT:C50996 | String |
Reason for bioinformatic protocol deviation | The rationale for why a deviation from the bioinformatic protocol has occurred. | NCIT:C93529 | String |
WGS guideline followed | Any followed systematic statement of policy rules or principles. Guidelines may be developed by government agencies at any level, institutions, professional societies, governing boards, or by convening expert panels. | NCIT:C17564 | String |
Each lookup is supplemented with so-called 'null flavors' from HL7. These can be used to indicate precisely why a particular value could not be entered into the system, providing substantially more insight than simply leaving a field empty.
Value | Description | Ontology |
---|---|---|
NoInformation | The value is exceptional (missing, omitted, incomplete, improper). No information as to the reason for being an exceptional value is provided. This is the most general exceptional value. It is also the default exceptional value. | HL7:NI |
Invalid | The value as represented in the instance is not a member of the set of permitted data values in the constrained value domain of a variable. | HL7:INV |
Derived | An actual value may exist, but it must be derived from the provided information (usually an EXPR generic data type extension will be used to convey the derivation expression . | HL7:DER |
Other | The actual value is not a member of the set of permitted data values in the constrained value domain of a variable.The actual value is not a member of the set of permitted data values in the constrained value domain of a variable. (e.g., concept not provided by required code system). | HL7:OTH |
Negative infinity | Negative infinity of numbers. | HL7:NINF |
Positive infinity | Positive infinity of numbers. | HL7:PINF |
Un-encoded | The actual value has not yet been encoded within the approved value domain. | HL7:UNC |
Masked | There is information on this item available but it has not been provided by the sender due to security, privacy or other reasons. There may be an alternate mechanism for gaining access to this information. | HL7:MSK |
Not applicable | Known to have no proper value (e.g., last menstrual period for a male). | HL7:NA |
Unknown | A proper value is applicable, but not known. | HL7:UNK |
Asked but unknown | Information was sought but not found (e.g., patient was asked but didn't know) | HL7:ASKU |
Temporarily unavailable | Information is not available at this time but it is expected that it will be available later. | HL7:NAV |
Not asked | This information has not been sought. (e.g., patient was not asked) | HL7:NASK |
Not available | Information is not available at this time (with no expectation regarding whether it will or will not be available in the future). | HL7:NAVU |
Sufficient quantity | The specific quantity is not known, but is known to be non-zero and is not specified because it makes up the bulk of the material. e.g. 'Add 10mg of ingredient X, 50mg of ingredient Y, and sufficient quantity of water to 100mL.' The null flavor would be used to express the quantity of water. | HL7:QS |
Trace | The content is greater than zero, but too small to be quantified. | HL7:TRC |