Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capitalization of detected symbols in "Alternative title(s)" and "Included title(s)" may be incorrect #129

Open
joeflack4 opened this issue Sep 5, 2024 · 0 comments · May be fixed by #153
Assignees
Labels
bug Something isn't working

Comments

@joeflack4
Copy link
Contributor

joeflack4 commented Sep 5, 2024

Overview

Some of the synonym capitalizations for "alternative" & "included" titles may be incorrect.

Breakdown

Context: In working on #128, #126 , and #125, I also noticed a problem with how synonym capitalization is being done for the Alternative Title(s); symbol(s) and Included Title(s); symbols we are getting from mimTitles.txt

We have this block of code:

        for exact_label in exact_labels:
            graph.add((omim_uri, oboInOwl.hasExactSynonym, Literal(label_cleaner.clean(exact_label, abbrev))))
        for label in other_labels:
            graph.add((omim_uri, oboInOwl.hasExactSynonym, Literal(label_cleaner.clean(label, abbrev))))
        for included_label in cleaned_inc_labels:
            graph.add((omim_uri, URIRef(INCLUDED_URI), Literal(label_cleaner.clean(included_label, abbrev))))

It uses .clean(), which uses various rules to determine the appropriate capitalization of "alternative titles" and "included titles". It takes a second, optional param: abbrev above. I believe that the important part here is that it is trying to look for symbols that are also contained within titles, and make sure that those are capitalized.

The potential issue I see is that the abbrev being used are the one(s) from Preferred Title; symbol only. It does not consider the "alternative symbols" or "included symbols". I think this is a bug, and that we also want to capitalize these too.

Suggestion

I believe that all of the symbols from all of the titles (preferred, alternative, included) should be passed into .clean() so that if any of those symbols appear within the text of any of those labels, they will be capitalized. I believe that this was the original intention.

@joeflack4 joeflack4 self-assigned this Sep 5, 2024
@joeflack4 joeflack4 changed the title Issues w/ alt & included titles capitalization Some alt & included titles capitalizations may be incorrect Sep 5, 2024
@joeflack4 joeflack4 added the bug Something isn't working label Sep 5, 2024
@joeflack4 joeflack4 changed the title Some alt & included titles capitalizations may be incorrect Some alt & included titles synonym capitalizations may be incorrect Sep 5, 2024
@joeflack4 joeflack4 changed the title Some alt & included titles synonym capitalizations may be incorrect Capitalization: "alt title synonyms" & "included annotation prop" - some may be incorrect Sep 5, 2024
@twhetzel twhetzel changed the title Capitalization: "alt title synonyms" & "included annotation prop" - some may be incorrect Capitalization of detected symbols in "Alternative title(s)" and "Included title(s)" may be incorrect Sep 5, 2024
@joeflack4 joeflack4 linked a pull request Sep 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant