Data model evolution planning #115

aclayton555 · 2024-07-08T18:49:48Z

To be performed in July 2024. Outcomes of this ticket should be a relatively comprehensive plan and related tickets to coordinate work for to August 2024 and onward.

Cover:

SCOPE: what needs to be done (e.g. data model changes to enhance linkages)
OUTPUTS: expects outputs (e.g. RFCs and deployment schedule)
DEPENDENCIES AND IMPACT: what, if any, downstream dependencies or required changes (e.g. data model changes that impact syncing scripts...especially since Verena will be OOO).
RESOURCES: how and who will perform this work.

Relates to #97 and #56

Bankso · 2024-08-30T18:23:40Z

High level summary of work completed for this:

Model and Individual v0 schemas created
Biospecimen data model updated/expanded
GeoMx model expanded/updated and split into level-specific folders, including the GeoMx config schema
Study and FileView schemas created and implemented
DUO code attribute added and integrated (not yet in use, currently planning for implementation with GovInn)
Dataset Sharing Plan schema created and implemented
10X Visium model adapted and implemented
Implementation of the <component>_id and <component> Key reference system
Separated attributes and valid values from shared into model-specific folders
Initial provisioning of folders and CSVs for sequencing model

See additional info added in #116

aclayton555 · 2024-08-30T19:48:11Z

24-7/8 close-out: have made a lot of changes in this refactor, and really want another set of eyes on this to make sure this makes sense. Okay to wait until October for a deep dive.

Priority however is to at least close out the CDS attribute mapping. Want to have this complete by site visit.

Set up meeting with Aditi, Orion, Aditya asap to push through this. Maybe bring Jess in.

aclayton555 · 2024-09-04T16:32:16Z

24-9: Working session scheduled fro Sept 11 at 9am PT.

Outcome of that meeting will be CDS mapping (priority for site visit). Toward end of that discussion, think about timelines/phased approach for releases (may not want to do this all at once since there will be some major changes).

aclayton555 · 2024-09-11T17:02:39Z

Notes from Sept 11 working session:

[Component]_id attribute is used for both Upsert, as well as a primary/foreign key unique identifier
Have a shared attribute table that defines set of shared attributes across components - would be nice to have this in HTAN, in addition to modularized approach
"Study" schema is intended to be flexible, but brings in CDS template attributes. These are flagged under "Source"
Model bifurcates into "resource" type schemas (e.g. dataset, grant) and the newer experimental information type schemas.
"Models" refers to experimental model systems (e.g. Zebrafish, cell lines), whereas "individuals" refers to human participants. For "Models," this is a proposal for a schema driven by use cases that Orion has encountered (@Bankso to consolidate documenting these). For "Individuals," this pull in a lot of the CDS attributes
Suggested addition to "Models" (re: "Model Method") in [Models] Option for "Model Method" to capture protocol or publication #143
Intended redundancy in attribute naming (i.e. component prefixes on attributes in each schema) because we don't know to what level contributors will want to annotated resources. However, there are opportunities to reduce redundancies in things like the Shared attributes, but note that mappings are established to maintain harmonization of these terms (e.g. Assay Type) across different schemas.
Biospecimen (parent vs. child) - @Bankso wants to think about and document how to do this, as we have already encountered issues with this in light sheet microsopy. Jess notes that AMP-AIM is also thinking about this. HTAN has outlined this in their ID provenance structure: https://docs.humantumoratlas.org/data_model/identifiers/
New "FileView" schema incorporates DUO codes (this is a shared attribute also with Dataset and Study). Ongoing conversations with GovInn about inferred annotations from DUO codes (i.e. a certain DUO code will specify a certain access restrictions). Also thinking about how to leverage the FileView level to capture longitudinal data and time course information, as represented in different contributed files.

Next steps:

Overall, CDS attribute mapping on track.
Welcome team input into the overall design and complexity. Want to have this as a solid foundation to build on. RFC process and direct contributor engagement will be critical to help inform required vs non-required.
In our documentation, want to make it clear (and provide examples!) of how contributors should expect to engage with templates (i.e. some vs all). the ongoing data sharing pilots can provide examples for this - see [Q4 2024] [Contributor-Facing Documentation] Schema Updates + Clarity on how to engage with schema + examples #142

aclayton555 · 2024-09-30T17:43:13Z

24-9 Close-out: This work will continue on into the next sprint. On track, but check in mid sprint.

aclayton555 · 2024-10-31T16:28:41Z

24-10: Okay to close. Next phase to work to continue in testing and in docs: #142

aclayton555 assigned Bankso and aclayton555 Jul 8, 2024

This was referenced Jul 8, 2024

Strategy for defining connections between different types of metadata #56

Closed

RFC schedule and deployment plan #97

Open

Integration/mapping to existing CDS templates #116

Closed

aclayton555 mentioned this issue Aug 30, 2024

Data model content updates to support GH docs #67

Open

aclayton555 added the priority-high label Sep 4, 2024

This was referenced Sep 11, 2024

[Q4 2024] [Contributor-Facing Documentation] Schema Updates + Clarity on how to engage with schema + examples #142

Open

[Models] Option for "Model Method" to capture protocol or publication #143

Open

aclayton555 closed this as completed Oct 31, 2024

aclayton555 mentioned this issue Oct 31, 2024

Data model updates integration testing #151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data model evolution planning #115

Data model evolution planning #115

aclayton555 commented Jul 8, 2024 •

edited

Loading

Bankso commented Aug 30, 2024

aclayton555 commented Aug 30, 2024

aclayton555 commented Sep 4, 2024

aclayton555 commented Sep 11, 2024 •

edited

Loading

aclayton555 commented Sep 30, 2024

aclayton555 commented Oct 31, 2024

Data model evolution planning #115

Data model evolution planning #115

Comments

aclayton555 commented Jul 8, 2024 • edited Loading

Bankso commented Aug 30, 2024

aclayton555 commented Aug 30, 2024

aclayton555 commented Sep 4, 2024

aclayton555 commented Sep 11, 2024 • edited Loading

aclayton555 commented Sep 30, 2024

aclayton555 commented Oct 31, 2024

aclayton555 commented Jul 8, 2024 •

edited

Loading

aclayton555 commented Sep 11, 2024 •

edited

Loading