-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HGNC robot template #559
HGNC robot template #559
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add: external/mondo_genes.robot.tsv
Pulls from latest release, e.g.:
22ac9b0
to
3ea3ec6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
@matentzn In the file However,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments from yesterday that I can now submit after restoring my internet.
I do not know what this MONDO:mim2gene_medgen source refers to. It looks like a way to say that we are getting this information from the omim-gene file that somehow involves medgen? I think it is ok to remove the MONDO:mim2gene_medgen sources and replace them with OMIM identifier. We would therefore have the source be:
|
MONDO:mim2gene_medgen is documented on the Entities page as "This indicates the gene relationship came from MedGen.". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think regardless of which other source annotation should be added, an additional source annotation is needed in the ROBOT template.
I feel like I don't have enough information to give a clear answer, but I will try.
My GUESS is that we were using the gene annotation from medgen at one point and medgen got this gene to disease annotation from omim (ie the MONDO:mim2gene_medgen source). It makes sense that we would switch to getting this information directly from omim now. , then there is no point in keeping it. I am assuming (again assuming, please someone confirm) |
Looking at the Monarch omim repo, I see references to the OMIM API and this download OMIM page so I guess the data in this HGNC ROBOT template is only coming from OMIM. That's the first thing for @joeflack4 or @matentzn to confirm. If the data in this HGNC ROBOT file is only from OMIM, then we can go with Sabrina's comment: The last thing that I do not know is where did the data (see example below) that is currently in Mondo with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicative sources?: MONDO:mim2gene_medgen
A / TLDR: I'm in favor of keeping as-is and merging for now if we need more time to think about this. But as we spend more time looking into this, I'm thinking we may drop MONDO:mim2gene_medgen
in favor of MONDO:OMIM
.
Man, sorry I could have provided some valuable information earlier, but I had some loose ends and wanted to collect my thoughts.
I decided to put this in a thread, if you guys don't mind responding further in this thread, to keep the related comments neatly together.
Some decisions / tasks
- 1. @twhetzel (or @matentzn) decide if this ROBOT template should be changed in any way based on this
- 2. @twhetzel (or @matentzn) decide if existing
MONDO:mim2gene_medgen
source annotation should now be removed or kept
Some Q&A
1. Does the omim
repo utilize any other sources, or just OMIM itself?
Looking at the Monarch omim repo, I see references to the OMIM API and this download OMIM page so I guess the data in this HGNC ROBOT template is only coming from OMIM. That's the first thing for @joeflack4 or @matentzn to confirm.
Just OMIM itself! We're pulling data files just from the OMIM API, and that's it.
2. Where MONDO:mim2gene_medgen
is coming from
Trish wrote:
@joeflack4 can you remind me whether these mappings originally were from the MedGen mappings file?
No, they're not coming from the medgen
repo, or from ftp.ncbi.nlm.nih.gov/pub/medgen/MedGenIDMappings.txt
.
I looked into it. They're being fetched in the mondo
repo, from https://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen. There is a mim2gene_medgen
goal that Chris wrote in 2017 that fetches this, and it's also referenced in a Perl script he wrote a few years later, and some other places; I'm not yet entirely sure how it's all put together in `mondo.
Comments
1. MONDO:OMIM
usage
If the data in this HGNC ROBOT file is only from OMIM... then we can use something like "MONDO:OMIM"
As stated above, it is. And currently, there is a column source_code
in the ROBOT template set to >A oboInOwl:source
, and all the values are MONDO:OMIM
.
2. Thoughts: Keep both OMIM & MedGen sources, or just 1?
A / TLDR: I lean towards keeping both for now, and merging this PR now. But we can investigate further and, if MedGen is just getting these from OMIM behind the scenes, then perhaps it is redundant and we should just fetch from OMIM as Trish is considering.
Further thoughts / responses
Trish:
I don't know if this HGNC template data in this PR is in addition to or intended to replace the existing data
That's for @matentzn to answer but my guess is that he either forgot about the mim2gene_medgen
source, or maybe he thought this was intended to be in addition.
Trish:
if both the data in this ROBOT template and the existing data are from the same source and therefore should have the same source annotation. @matentzn do you know the answer to this?
We are fetching them from 2 different source files (one from OMIM, and one from MedGen), but as I contemplated above, it could be that MedGen sources this info from OMIM behind the scene.
I was at first confused by @sabrinatoro's comment. I read her say that she thinks we should keep both, but then keep just the new OMIM one. But I think that @sabrinatoro, @twhetzel , and I are mulling the same thing: that maybe MedGen is just sourcing these from OMIM, so we might be best to replace the old MONDO:mim2gene_medgen
source with the new MONDO:OMIM
one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn responded on slack (thank you!).
Basically I think the 4 of us are in the same place on this. We will keep both sources for now, the one from OMIM (w/ MONDO:OMIM
) and the one from MedGen (w/ MONDO:mim2gene_medgen
), but we could consider removing the latter.
Where we left off:
Nico:
I think the missing piece in the thought process was that we decided to erase all existing gene links, and make OMIM (for now) the only source (even if once upon a time the mim2gene_medgen was a source). See you on Monday!
@@ -0,0 +1,3366 @@ | |||
mondo_id hgnc_id omim_disease_xref source_code omim_gene | |||
ID SC 'has material basis in germline mutation in' some % >A oboInOwl:source >A oboInOwl:source | |||
http://purl.obolibrary.org/obo/MONDO_0000208 https://identifiers.org/hgnc/28403 OMIM:616033 MONDO:OMIM https://omim.org/entry/616013 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we decided yesterday to not include a column with MONDO:OMIM
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, already done, see:
- This comment about that thing being done: HGNC robot template omim#113 (comment)
- This comment about the need to update
mondo-ingest
(by that, I meant this PR): HGNC robot template omim#113 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- @twhetzel Review and approve
Now these files are also up-to-date in this PR as well:
external/mondo-omim-genes.robot.tsv
external/mondo-omim-genes.robot.owl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment inline
- Add: Goals for external/mondo_genes.robot.tsv - Add: external/mondo_genes.robot.tsv - Add: external/mondo_genes.robot.owl General - Update: Reorganized external/ goals a bit.
Addresses sub-tasks in:
Related:
Overview
Update
mondo_genes.csv
to be a proper ROBOT template, and ties into pipeline for externally managed content.Pre-merge checklist
Documentation
Was the documentation added/updated under
docs/
?QC
Was the full pipeline run before submitting this PR using
sh run.sh make build-mondo-ingest
on this branch (afterdocker pull obolibrary/odkfull:dev
), and no errors occurred?Build PR:
New Packages
Were any new Python packages added?
Were any other non-Python packages added?
PR Review and Conversations Resolved
Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?
CC: @souzadevinicius Thought this would be a good one for you to review