Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze clean_label() results (Roman numerals, etc) #163

Open
joeflack4 opened this issue Nov 6, 2024 · 3 comments
Open

Analyze clean_label() results (Roman numerals, etc) #163

joeflack4 opened this issue Nov 6, 2024 · 3 comments
Assignees
Labels
analysis Not a feature or update to the core of the repository, but an ad hoc analysis.

Comments

@joeflack4
Copy link
Contributor

Overview

In #142, Joe created some outputs that show the results of labels before and after clean_label().

Does it actually do what we want?
When it changes things to roman numerals, does that make things confusing in some cases? Like sometimes does it actually not look like a roman numeral number, but an acronym? E.g. "1A" changes to "IA"

Resources

@joeflack4 joeflack4 assigned joeflack4 and twhetzel and unassigned joeflack4 Nov 6, 2024
@joeflack4 joeflack4 added the analysis Not a feature or update to the core of the repository, but an ad hoc analysis. label Nov 6, 2024
@joeflack4
Copy link
Contributor Author

@twhetzel I added to your board, but FYI your board doesn't have an "urgency" field for "low".

@matentzn
Copy link
Member

matentzn commented Nov 6, 2024

This may not be that urgent, but I would like to suggest that

  1. Numeric normalisation like this should be generalised and not be done ad-hoc just for OMIM ingest (no more scripts - oak synonymiser ideally or if this is too much of a lift mondolib)
  2. This needs to be really carefully reviewed by a domain expert curator

@joeflack4
Copy link
Contributor Author

@matentzn I agree with that. I've never used synonymiser but that sounds like a great idea! Just wanted to drop FYI just in case you were unaware: that this isn't a review of something new, but part of the pipeline that's been around for about 3 years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Not a feature or update to the core of the repository, but an ad hoc analysis.
Projects
None yet
Development

No branches or pull requests

3 participants