Skip to content

Latest commit

 

History

History
104 lines (74 loc) · 4.33 KB

README.md

File metadata and controls

104 lines (74 loc) · 4.33 KB

Collections of metadata files of use by the GOC.

In general we follow the pattern:

  • metadata source in YAML
  • schema for each file also specified in YAML
  • metadata can be edited via github web interface, followed by Pull Request
  • Travis-CI checks file against schema - see the ../.travis.yml, if passes can be merged
  • Jenkins jobs publish metadata files on go-public S3 bucket (TODO: permanent URL)

users.yaml

Content:

Each entry is for metadata about a single user. This drives a lot of behavior such as who can do what in Noctua or TG, and is also used for provenance purposes. We want to track all contributions made to any GO content (ontology, annotations or models) and so we want to be sure we have a way of uniquely identifying users through their different aliases and accounts.

note - for historic purposes, some entries in users.yaml are actually transient groups of users. these will be migrated to groups.yaml. The main blocker for this is that TG reads users.yaml but not groups.yaml.

Fields:

  • nickname (REQUIRED) - typically first plus last name (not actually nickname in the usual sense)
  • uri (RECOMMENDED, UNIQUE) - A Uniform Resource Indicator or Compact URI that uniquely identifies a person.
    • Typically an ORCID http URL
    • If no ORCID available then a GOC Compact URI is used, e.g. GOC:cjm
    • Noctua - uses this field for auto-assigning dc:creator to instances
  • xref (OPTIONAL, UNIQUE) - a compact URI that uniquely identifiers the person, e.g. GOC:cjm
    • optional
    • this is partly historical. The ontology definition xrefs field uses these
    • TermGenie - uses this as a lookup for ontology definition xrefs
  • organization (RECOMMENDED) - the primary organization to which a person belongs
    • although a person may be involved in more than one, typically their GO role will be through one
    • this field is primarily for informational purposes
  • groups (ZERO TO MANY) - the groups a person belongs to (see below for more on groups)
    • Noctua uses this information to allow a person to attribute pav:provided_by annotations
  • accounts (DICT) - a dictionary mapping account type to username
    • Noctua uses this information for login/authentication
    • TermGenie uses this information for login/authentication
  • authorizations (DICT)
    • Noctua uses this information to authorization (determining if your account is allowed to edit)
    • TermGenie uses this information to authorization (determining if your account is allowed to edit)
  • email-md5 deprecated

Tracking contributions to GO:

In the GO graphstore, we typically have triples:

<instance> dc:author <user-uri>
<instance> dc:contributor <user-uri>

These are auto-generated by Noctua.

Additionally, where provenance is added directly in the ontology, the information is stored as a dbxref "axiom annotation" on top of the association between the term URI and the definition string. See section 5.6 of the obo-syntax spec for full details.

groups.yaml

Groups encompasses organizations, projects, working groups, content meetings, grants, etc. We call them groups as these typically consist of groups of users. Some groups may be transient. Others may be permanent institutions.

Fields:

  • id (REQUIRED, UNIQUE) - a URI uniquely identifying the group. Typically the official URL.
  • label (REQUIRED, UNIQUE) - e.g. university name, grant name. Should be unique but this is not actually tracked

TODO: each group should have a point of contact, and that POC should be in users

Tracking contributions to GO:

In the GO graphstore, we typically have triples:

<instance> pav:providedBy <group-uri>

These are added by Noctua. Note the user must select one or more group roles (multiple roles OK).

db-xrefs.yaml

datasets

See the datasets/ directory for more details