Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data (rd3): add emx2 mapping script #60

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

davidruvolo51
Copy link
Collaborator

@davidruvolo51 davidruvolo51 commented Dec 17, 2024

The purpose of this PR is to add the mapping script for EMX2 migration. This includes the steps to reshape the data, recoding the values to the new ontologies, and the new EMX1 to EMX2 column name mappings.

What are the main changes you did

  • solverd_subjects to

    • Individuals
    • Pedigree: relationships need to be mapped and linked here
      • Pedigree members: we can accurately detect parent-patient relationships.
    • Individual Observations
      • Clinical observations
      • Disease History
      • Phenotypic Features
  • solverd_samples

    • Biosamples
  • solverd_labinfo

    • Protocol parameters (this needs more planning)
  • solverd_samples

    • Files: for file paths, multiple paths should be collapsed into a comma separated string

To do

Questions and issues

  • Mismatch between ontology table names and dataset names: For some ontologies, the table names do not match the name of the corresponding csv file. For example, "gender at birth" is the ontology table name, but the file is "assigned gender at birth". We need to make sure all table names are in sync

@davidruvolo51 davidruvolo51 changed the title feat: init mapping script data (rd3): add emx2 mapping script Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant