Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data model content updates to support GH docs #67

Open
Bankso opened this issue Feb 5, 2024 · 8 comments
Open

Data model content updates to support GH docs #67

Bankso opened this issue Feb 5, 2024 · 8 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@Bankso
Copy link
Contributor

Bankso commented Feb 5, 2024

Relative to #49 and #66

Draft of the MC2 data model dictionary, using GitHub pages deployment, is here: https://mc2-center.github.io/data-models/

Potential actions that could improve documentation quality (should determine necessity/priority for the following):

  • review all Component and Attribute entries and add descriptions where missing/incomplete
  • review Valid Values and add descriptions, ontology references
  • implement hierarchy for Valid Values
@vpchung vpchung added the documentation Improvements or additions to documentation label Feb 6, 2024
@aclayton555
Copy link

in 24-3, at least first two bullets are readily do-able. Third bullet might be more difficult so need to scope this further and see how far we can get.

@Bankso
Copy link
Contributor Author

Bankso commented Apr 12, 2024

Currently reviewing components, attributes, and valid values

For hierarchy/structure, I did some preliminary analysis with GPT and ontology scoping, documented here:
https://docs.google.com/document/d/1Vs-X4laTfih2YpoouF0njCCSQmcC4AIpsmgmCdl5b9c/edit?usp=sharing

Summary: it seems doable, but it will be a lot of work. To help minimize effort required, I'll source from existing ontologies for structure and devise mappings when needed.

In terms of implementation, I think defining pair-wise relationships will be sufficient, since the information will be carried forward in each mapping. A generic example would be:

Take five terms: RNA-seq, scRNA-seq, ATAC-seq, scATAC-seq, WGS
Highest level group: Genomic technique
Possible second level groups: bulk, single-cell, transcriptomics, epigenomics, RNA, DNA (lots of options, is the point)

Organizing terms would occur in a CSV, using the column names:
Technique (should replace assay), Parent, [all other info captured]

Then relationships are easy to define and structure is easily inferred, using Genomic --> bulk, single-cell --> RNA-seq, scRNA-seq, ATAC-seq, scATAC-seq, WGS

Technique, Parent
Genomic, None
Bulk, Genomic
Single-cell, Genomic
RNA-seq, Bulk
ATAC-seq, Bulk
WGS, Bulk
scRNA-seq, Single-cell
scATAC-seq, Single-cell
.
.
.

@aclayton555
Copy link

Suggest to chat with ANV to see how this was designed and implemented in NF

@aclayton555
Copy link

  • Continue working through building this out
  • For search purposes, FTS may not necessitate this hierarchy, so that use case can be deprioritized until we know if FTS is favourable.

@aclayton555
Copy link

24-6: No updates this sprint. Carry into next sprint

@Bankso
Copy link
Contributor Author

Bankso commented Jul 4, 2024

I will continue to collate valid value definitions here for assays, tissues, and tumor types here: https://docs.google.com/spreadsheets/d/1YL8kDB_tdvGDYqDy4x8zlBauLPDxc24W4tLEArJh0kQ/edit?usp=sharing

In addition, there are many valid value sets that are missing descriptions/definitions, like file formats, licenses, input/output formats, etc. Next step here is identify all value types that would benefit from this exercise and note them here.

@aclayton555
Copy link

24-7/8 close out: have new models add (per #115 )

@aclayton555
Copy link

24-9: Secondary to site visit priorities.

Will require some work to add new components. Might be some room for automation to help pull this information easier as the data model updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants