Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HGNC robot template #559

Merged
merged 1 commit into from
Jun 26, 2024
Merged

HGNC robot template #559

merged 1 commit into from
Jun 26, 2024

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Jun 6, 2024

Addresses sub-tasks in:

Related:

Overview

Update mondo_genes.csv to be a proper ROBOT template, and ties into pipeline for externally managed content.

Pre-merge checklist

Documentation

Was the documentation added/updated under docs/?

  • Yes
  • No, updates to the docs were not necessary after careful consideration

QC

Was the full pipeline run before submitting this PR using sh run.sh make build-mondo-ingest on this branch (after
docker pull obolibrary/odkfull:dev), and no errors occurred?

  • Yes
  • No, there are no functional (code-related) changes to the pipeline in the PR, so no re-run is necessary

Build PR:

New Packages

Were any new Python packages added?

Were any other non-Python packages added?

PR Review and Conversations Resolved

Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?

  • Yes

CC: @souzadevinicius Thought this would be a good one for you to review

@joeflack4 joeflack4 marked this pull request as draft June 6, 2024 21:01
@joeflack4 joeflack4 self-assigned this Jun 6, 2024
@joeflack4 joeflack4 added omim enhancement New feature or request labels Jun 6, 2024
@joeflack4 joeflack4 changed the base branch from main to develop June 6, 2024 21:04
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: external/mondo_genes.robot.tsv

Pulls from latest release, e.g.:

src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the hgnc-template branch 3 times, most recently from 22ac9b0 to 3ea3ec6 Compare June 13, 2024 00:03
@joeflack4 joeflack4 marked this pull request as ready for review June 13, 2024 00:05
@joeflack4 joeflack4 added the hgnc label Jun 13, 2024
Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@twhetzel
Copy link
Contributor

twhetzel commented Jun 13, 2024

@matentzn In the file https://github.com/monarch-initiative/mondo-ingest/blob/hgnc-template/src/ontology/external/mondo_genes.robot.tsv for MONDO_0000208 I see the source is the OMIM record. This seems a better option that what is currently in Mondo as the source, e.g. MONDO:mim2gene_medgen.

However,

  • was this change discussed with curators?
  • is there a check in place after this robot template is merged into mondo-edit.obo to make sure there are not any classes with the has material basis in germline mutation in that have the old source annotation (MONDO:mim2gene_medgen)?

Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments from yesterday that I can now submit after restoring my internet.

src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
@sabrinatoro
Copy link
Contributor

MONDO:mim2gene_medgen

I do not know what this MONDO:mim2gene_medgen source refers to. It looks like a way to say that we are getting this information from the omim-gene file that somehow involves medgen?
@nicolevasilevsky do you remember anything about this?

I think it is ok to remove the MONDO:mim2gene_medgen sources and replace them with OMIM identifier.
However, it would not hurt to keep something like MONDO:mim2gene as a source to indicate that this annotation was made via a specific pipeline (similar to the "MONDO:MEDGE" source on the UMLS x-ref- image below for illustration).

We would therefore have the source be:

  • OMIM:1234
  • MONDO:OMIM2GENE

Screenshot 2024-06-13 at 1 03 39 PM

@twhetzel
Copy link
Contributor

MONDO:mim2gene_medgen is documented on the Entities page as "This indicates the gene relationship came from MedGen.".
@joeflack4 can you remind me whether these mappings originally were from the MedGen mappings file?
@sabrinatoro do you want a different annotation used for the source still given the definition and pending Joe's answer for the question above?

@twhetzel twhetzel self-requested a review June 13, 2024 20:45
Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think regardless of which other source annotation should be added, an additional source annotation is needed in the ROBOT template.

@sabrinatoro
Copy link
Contributor

@sabrinatoro do you want a different annotation used for the source still given the definition and pending Joe's answer for the question above?

I feel like I don't have enough information to give a clear answer, but I will try.
Where is the information coming from?

  • if we still get information from MedGen, then absolutely yes, we need to keep MONDO:mim2gene_medgen (and make sure to associate it with the correct annotations).
  • if we get the information directly from OMIM, then we can use something like "MONDO:OMIM" (or whatever source we have to say that something comes from OMIM; I don't have a strong opinion about what to name it, but I can make a name up).
  • if we get information from both, then we need to use both sources, associated to annotations appropriately.

My GUESS is that we were using the gene annotation from medgen at one point and medgen got this gene to disease annotation from omim (ie the MONDO:mim2gene_medgen source). It makes sense that we would switch to getting this information directly from omim now.

, then there is no point in keeping it. I am assuming (again assuming, please someone confirm)

@twhetzel
Copy link
Contributor

Looking at the Monarch omim repo, I see references to the OMIM API and this download OMIM page so I guess the data in this HGNC ROBOT template is only coming from OMIM. That's the first thing for @joeflack4 or @matentzn to confirm.

If the data in this HGNC ROBOT file is only from OMIM, then we can go with Sabrina's comment:
if we get the information directly from OMIM, then we can use something like "MONDO:OMIM" (or whatever source we have to say that something comes from OMIM; I don't have a strong opinion about what to name it, but I can make a name up).
(from Trish - MONDO:OMIM fits the pattern I see for GARD and NORD so +1 from me)

The last thing that I do not know is where did the data (see example below) that is currently in Mondo with has material basis in germline mutation in come from and more importantly do we need to do anything about it.
For example, MONDO:0000208 and 'has material basis in germline mutation in' some TRMT10A with source MONDO:mim2gene_medgen.
I don't know if this HGNC template data in this PR is in addition to or intended to replace the existing data and if both the data in this ROBOT template and the existing data are from the same source and therefore should have the same source annotation. @matentzn do you know the answer to this?

Copy link
Contributor Author

@joeflack4 joeflack4 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicative sources?: MONDO:mim2gene_medgen

A / TLDR: I'm in favor of keeping as-is and merging for now if we need more time to think about this. But as we spend more time looking into this, I'm thinking we may drop MONDO:mim2gene_medgen in favor of MONDO:OMIM.


Man, sorry I could have provided some valuable information earlier, but I had some loose ends and wanted to collect my thoughts.

I decided to put this in a thread, if you guys don't mind responding further in this thread, to keep the related comments neatly together.


Some decisions / tasks


Some Q&A

1. Does the omim repo utilize any other sources, or just OMIM itself?

Trish:

Looking at the Monarch omim repo, I see references to the OMIM API and this download OMIM page so I guess the data in this HGNC ROBOT template is only coming from OMIM. That's the first thing for @joeflack4 or @matentzn to confirm.

Just OMIM itself! We're pulling data files just from the OMIM API, and that's it.

2. Where MONDO:mim2gene_medgen is coming from

Trish wrote:

@joeflack4 can you remind me whether these mappings originally were from the MedGen mappings file?

No, they're not coming from the medgen repo, or from ftp.ncbi.nlm.nih.gov/pub/medgen/MedGenIDMappings.txt.

I looked into it. They're being fetched in the mondo repo, from https://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen. There is a mim2gene_medgen goal that Chris wrote in 2017 that fetches this, and it's also referenced in a Perl script he wrote a few years later, and some other places; I'm not yet entirely sure how it's all put together in `mondo.


Comments

1. MONDO:OMIM usage

Trish:

If the data in this HGNC ROBOT file is only from OMIM... then we can use something like "MONDO:OMIM"

As stated above, it is. And currently, there is a column source_code in the ROBOT template set to >A oboInOwl:source, and all the values are MONDO:OMIM.

2. Thoughts: Keep both OMIM & MedGen sources, or just 1?

A / TLDR: I lean towards keeping both for now, and merging this PR now. But we can investigate further and, if MedGen is just getting these from OMIM behind the scenes, then perhaps it is redundant and we should just fetch from OMIM as Trish is considering.

Further thoughts / responses

Trish:

I don't know if this HGNC template data in this PR is in addition to or intended to replace the existing data

That's for @matentzn to answer but my guess is that he either forgot about the mim2gene_medgen source, or maybe he thought this was intended to be in addition.

Trish:

if both the data in this ROBOT template and the existing data are from the same source and therefore should have the same source annotation. @matentzn do you know the answer to this?

We are fetching them from 2 different source files (one from OMIM, and one from MedGen), but as I contemplated above, it could be that MedGen sources this info from OMIM behind the scene.

I was at first confused by @sabrinatoro's comment. I read her say that she thinks we should keep both, but then keep just the new OMIM one. But I think that @sabrinatoro, @twhetzel , and I are mulling the same thing: that maybe MedGen is just sourcing these from OMIM, so we might be best to replace the old MONDO:mim2gene_medgen source with the new MONDO:OMIM one.

Copy link
Contributor Author

@joeflack4 joeflack4 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn responded on slack (thank you!).

Basically I think the 4 of us are in the same place on this. We will keep both sources for now, the one from OMIM (w/ MONDO:OMIM) and the one from MedGen (w/ MONDO:mim2gene_medgen), but we could consider removing the latter.


Where we left off:

Nico:

I think the missing piece in the thought process was that we decided to erase all existing gene links, and make OMIM (for now) the only source (even if once upon a time the mim2gene_medgen was a source). See you on Monday!

@@ -0,0 +1,3366 @@
mondo_id hgnc_id omim_disease_xref source_code omim_gene
ID SC 'has material basis in germline mutation in' some % >A oboInOwl:source >A oboInOwl:source
http://purl.obolibrary.org/obo/MONDO_0000208 https://identifiers.org/hgnc/28403 OMIM:616033 MONDO:OMIM https://omim.org/entry/616013
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided yesterday to not include a column with MONDO:OMIM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, already done, see:

Copy link
Contributor Author

@joeflack4 joeflack4 Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now these files are also up-to-date in this PR as well:

  • external/mondo-omim-genes.robot.tsv
  • external/mondo-omim-genes.robot.owl

Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment inline

- Add: Goals for external/mondo_genes.robot.tsv
- Add: external/mondo_genes.robot.tsv
- Add: external/mondo_genes.robot.owl

General
- Update: Reorganized external/ goals a bit.
@joeflack4 joeflack4 merged commit 41ff1e2 into develop Jun 26, 2024
@joeflack4 joeflack4 deleted the hgnc-template branch June 26, 2024 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hgnc omim
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants