Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration/mapping to existing CDS templates #116

Closed
aclayton555 opened this issue Jul 8, 2024 · 9 comments · Fixed by #147
Closed

Integration/mapping to existing CDS templates #116

aclayton555 opened this issue Jul 8, 2024 · 9 comments · Fixed by #147
Assignees

Comments

@aclayton555
Copy link

In HTAN, we have established an initial approach to mapping the HTAN data model to the latest CDS Sequencing and CDS Imaging templates. @aditigopalan has been critical to this process. Note that the HTAN approach was performed a bit out of necessity and rushed toward the end of HTAN phase 1.0. So while the HTAN way works, there's an opportunity to build on this for MC2 and take a bit more time to strategize on an optimal approach. What we then do in MC2 may inform how we proceed into HTAN phase 2.0.

Suggested that we start with an initial show & tell type meeting to cover what has been done in HTAN so far and plan out what we do for MC2. This effort is expected to be within the scope of work for #115

Initial meeting scheduled for July 25. Thanks Aditi!

@aditigopalan
Copy link
Contributor

For reference, here is the mapping file we use for HTAN.

@aditigopalan
Copy link
Contributor

aditigopalan commented Jul 25, 2024

Start with imaging templates, and sequencing

  • Add required attributes to MC2 (Latest template here)
  • Compare HTAN and CDS attribute names
  • Modify mappings.yml to fit MC2 model

@aditigopalan
Copy link
Contributor

CDS files on mc2-cds branch

@Bankso
Copy link
Contributor

Bankso commented Aug 30, 2024

CDS imaging model integration is in progress here: https://github.com/mc2-center/data-models/tree/refactor-add-cds-imaging

Imaging attributes from CDS model have been integrated into a set of level 1 - 4 + channel schemas as our initial imaging data model (essentially recapitulates HTAN model at this point)

Additional attributes from CDS were integrated into the Biospecimen schema redesign and the unreleased Model and Individual schemas

@Bankso
Copy link
Contributor

Bankso commented Aug 30, 2024

Next steps here:

  • add all valid values to model and updating mappings.yml for imaging, model, and individual schemas
  • add CDS sequencing attributes, valid values, value mappings and define sequencing schemas for levels 1 - 3/4
  • update documentation to reflect new schemas and module folder refactor

@aclayton555
Copy link
Author

Will be closed out with #115

@aclayton555
Copy link
Author

24-9 close-out: terms have been added. need to ensure valid values are mapped accordingly to the attributes. Proposing that we have a pre-release version of the model to test mappings. Aim to check in on this during mid-sprint check in.

Question: Many of the CDS terms (especially biospecimen) are very specific. Do we want to use these, or do we want to map to these from our preferred terms. For human data, use CDS terms in the model, but we can have our own internal mappings to preferred (and simplified) terms.

Do we have datasets that will be going to CDS? Zebrafish data, NYU data. We will be annotating these with the intent that they could go to CDS, but it will be up to the contributors to decide whether they are submitted there.

Need to think about the interaction and data flow scenarios:

  1. data and metadata come to Synapse, then we facilitate transfer to CDS and surface metadata and indexing in CCKP
  2. data are contributed directly to CDS, then we infer metadata from CDS submission?

@aclayton555
Copy link
Author

24-10 Terms have been integrated, but encountering issues with commas within valid values. Will fix so that data model deploys.

end of OCt, aim to have pre-release version of model with templates integrated, and conducted group review in 24-11/12 sprint.

@aclayton555
Copy link
Author

24-10 close out: PR is probably okay to merge, but we should note a lot of needed changes especially to documentation (tech writer will be able to help with this to update docs in data model explorer pages). Agree to merge PR so that we can start playing around with the templates, and backlog a ticket to update the docs.

With today's data model release, this would put these out into our prod data model, but we could hold off on updating our DCA config if we want to hold off on making these changes live for external users.

This will include some table changes. Once merged, cross check that nothing has broken.

Once conflicts have been resolved in PR, merge, and proceed to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants