Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling multiple config/*property-map*.sssom.tsvs #482

Closed
3 tasks done
joeflack4 opened this issue Mar 29, 2024 · 11 comments
Closed
3 tasks done

Handling multiple config/*property-map*.sssom.tsvs #482

joeflack4 opened this issue Mar 29, 2024 · 11 comments
Assignees
Labels

Comments

@joeflack4
Copy link
Contributor

joeflack4 commented Mar 29, 2024

Overview

There are multiple config/*property-map*.sssom.tsv files. De-proliferate these.

Sub-task list

  • 1. Wait on ICD11 Ingest #434 to be merged first, as that will introduce a new file
  • 2. Implement solution
  • 3. Update mondo-ingest.Makefile

Sub-task details

2. Implement solution

a. Single file (if allowable, though there may be some weird ROBOT issue prohibiting it)
b. Merge all mappings into 1 file, and have a second, empty file (to get around ROBOT issue)
c. Keep property-map-1 and property-map-2 as-is, and then add other ones, e.g.icd11foundation-property-map file into one of these. One disadvantage of that is this.

Additional details about solutions: #434 (comment)

@joeflack4 joeflack4 added the code quality House keeping label Mar 29, 2024
@joeflack4 joeflack4 self-assigned this Mar 29, 2024
@joeflack4
Copy link
Contributor Author

Context: this comment

@twhetzel I'm making this one high priority just because it is such a quick change to make.

@joeflack4 joeflack4 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2024
@joeflack4 joeflack4 reopened this Mar 29, 2024
@joeflack4 joeflack4 changed the title Merge property map files Handling multiple config/*property-map*.sssom.tsvs Mar 29, 2024
@joeflack4 joeflack4 mentioned this issue Mar 29, 2024
9 tasks
@joeflack4
Copy link
Contributor Author

And also one advantage of splitting them up is that we can drop --allow-missing-entities true, and if the source changes and a mapping is no longer there, we'll know about it because it will error.

@joeflack4
Copy link
Contributor Author

There's also the possible problem that if we merge them into 1 file and leave the other one blank, then when ROBOT fixes this problem, it might get annoyed by the blank file? Maybe not.

@joeflack4
Copy link
Contributor Author

We've decided this issue is a bit of a distraction. We can always re-open it if needed, especially if ROBOT fixes the issue where 2 property maps / more than one are arbitrarily required.

@joeflack4 joeflack4 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2024
@joeflack4
Copy link
Contributor Author

Nico wrote:

there cant be a single one due to some problems with ROBOT rename

@matentzn BTW, I couldn't find an issue for this in the robot GitHub.If there's not one there, should I make one?

@matentzn
Copy link
Member

If you like, the issue would go something like this:

Add --allow-duplicate-entities as option in ROBOT rename.

In many pipelines, especially ETL, we want to unify a few different properties like their:label, skos:prefLabel, wikidata:name to a single one, eg rdfs:label. This is currently not possible as the rename command fails hard multiple properties are mapped to a single one. The reverse is obviously bogus (renaming 1 property to many), but it would be nice if we could bypass this precaution with an optional flag, --allow-duplicate-entities, similar as to how we treat missing entities --allow-missing-entities.

@joeflack4
Copy link
Contributor Author

I may be following you, but let me check.

Cases:

  1. Many -> 1
  2. 1 -> Many

Are you saying that --allow-duplicate-entities is something that would enable (1) or (2)? It seems like it'd be for (2). But our problem with mondo-ingest involves (1). So wouldn't I need to create 2 GitHub issues, one for each of these cases?

Example files from mondo-ingest
It looks like for mondo-ingest, we only have case (1). We're trying to map multiple fields to oboInOwl:source and oboInOwl:hasExactSynonym. And that's why we split into two files.

property map TSV files

property-map-1.sssom.tsv:

subject_id	object_id
http://purl.org/dc/elements/1.1/source	oboInOwl:source
http://www.w3.org/2004/02/skos/core#prefLabel	http://www.w3.org/2000/01/rdf-schema#label
http://www.w3.org/2004/02/skos/core#altLabel	oboInOwl:hasExactSynonym

property-map-2.sssom.tsv:

subject_id	object_id
http://www.ebi.ac.uk/efo/definition	http://purl.obolibrary.org/obo/IAO_0000115
http://www.ebi.ac.uk/efo/reason_for_obsolescence	rdfs:comment
http://www.ebi.ac.uk/efo/alternative_term	oboInOwl:hasExactSynonym
http://purl.obolibrary.org/obo/ECO_0000218	oboInOwl:source

So for mondo-ingest, if (1) were to be addressed, then the following could be changed to a single line:

		rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
		rename --mappings config/property-map-2.sssom.tsv --allow-missing-entities true \

--->

		rename --mappings config/property-map.sssom.tsv --allow-missing-entities true \

@joeflack4 joeflack4 mentioned this issue Apr 18, 2024
9 tasks
@matentzn
Copy link
Member

Are you saying that --allow-duplicate-entities is something that would enable (1) or (2)? It seems like it'd be for (2). But our problem with mondo-ingest involves (1). So wouldn't I need to create 2 GitHub issues, one for each of these cases?

No it would only enable (1). Many different labels properties unify to a single rdfs:label property. The reverse makes no sense, so no need for a specific issue on this (Imagine you have rdfs:label and rename it to 3 different label properties - that would be duplicating information).

@joeflack4
Copy link
Contributor Author

joeflack4 commented Apr 18, 2024

Yeah that makes sense, I was just confused by what you'd written.

OK, so I can make an issue to enable (1) Many -> 1 situations, via --allow-duplicate-entities.
Edit: Created!:


I wonder why should Many -> 1 even be an issue? Why should we have to add such a flag? I guess I figured out why.

Like for example, in mondo-ingest, we need to convert http://purl.org/dc/elements/1.1/source and http://purl.obolibrary.org/obo/ECO_0000218 to oboInOwl:source. I wonder: Why would robot care about that?
I suppose it is because it expects that the source being modified/queried by the robot command would be a single, coherent ontology. If we're asking it to do a many -> 1 conversion, then that would indicate that there were 2 different properties in the ontology that were duplicative.
But for us, we're re-using these property map files across multiple use cases; something that robot isn't really considering by design.

@matentzn
Copy link
Member

Imagine you have an ontology that imports another ontology. The first one uses dcterms:contributor the other one oboInOwl:creator. Now before you create a merged release you want to unify both to dcelements:contributor. That's a use case for ROBoT! Thanks for making the issue!

@joeflack4
Copy link
Contributor Author

@matentzn hanks. That's not what I was pondering. I wondered why robot doesn't allow these duplicates by default but I think I pondered the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants