Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter exporters #8

Open
2 of 4 tasks
joeflack4 opened this issue Nov 9, 2023 · 4 comments
Open
2 of 4 tasks

Converter exporters #8

joeflack4 opened this issue Nov 9, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@joeflack4
Copy link
Contributor

joeflack4 commented Nov 9, 2023

Overview

mondo-ingest utilizes SemanticSQL, which requires prefix maps in CSV form. We should create our own function in mondolib (unless Charlie thinks it's a good idea for this functionality to go in the curies package) that can generate this CSV so that we have 1 less place to maintain mappings.

Sub-tasks

Additional information

Context

Comments in the synchronization: subclass axioms PR: 1, 2

Possible design

I think we should maintain these static files:

  • metadata/mondo.sssom.config.yml
  • metadata/SOURCE.yml for each source

Then, we should have some means of reading these in and instantiating a curies.Converter and from that export to the needed formats.

Related

@matentzn
Copy link
Member

Fantastic idea! THANK YOU!

@joeflack4
Copy link
Contributor Author

This topic came up when working on monarch-initiative/mondo-ingest#394.

Nico suggestion on how to do this:

Start with EPM. Load into Converter. Then do .bimap(), and it can generate the plain flat bimap which I can save as CSV.

@cthoyt
Copy link

cthoyt commented Jan 19, 2024

@joeflack4 can you point me towards the code that saves these CSVs? maybe worth having an I/O function upstream in curies (or to update SemanticSQL to use standard file formats like EPMs ;))

@joeflack4
Copy link
Contributor Author

@cthoyt I appreciate you chiming in! I wrote in the OP that "unless Charlie thinks it's a good idea for this functionality to go in the curies package", but I think actually we would prefer that.

I don't foresee SemanticSQL having the bandwidth to add EPMs now but I could be wrong.

Here's an example of a prefixes.csv compatible with SemanticSQL: https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/config/prefixes.csv
The two columns are prefix and base. In curies, base is called uri prefix, so I guess there are basically two options:
a. curies could have like a .to_csv() and call the 2nd column uri_prefix, and then for mondo-ingest purposes or other SemnaticSQL users, we'd use this method, but then have to change the 2nd column header on our end.
b. curies .to_csv() could have like a format param with values like standard and semanticsql, or just a boolean semanticsql param.

Actually though I just realized I am not sure if the headers (prefix and base) even matter to SemanticSQL. Perhaps it only cares about the column order? I don't see anything about the CSV in the docs.

@joeflack4 joeflack4 changed the title Converter -> prefixes.csv exporter Converter exporters Mar 12, 2024
@joeflack4 joeflack4 self-assigned this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants