-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate ʽἑκάεργος
and ̓Ολυμπιιάς
#72
Comments
I'm looking at ʽἑκάεργος. This lemma is odd, and I have no idea why the lemmatizer would output a lemma beginning with a comma and not a letter. The original words in the texts (ἑκάεργον at Iliad 1.147, 474; Hom.Hymn 4.239) are not preceded by commas. I've tried to fix this in lemma.py by mapping ἑκάεργον to Ἑκάεργος (commit 7d9583c, as well as fixing an error in the original beta code, a soft instead of rough breathing for three instances of Far-shooting):
*(eka/ergon
e(ka/ergon
e(ka/ergon The mapping fixed *(eka/ergon, but not e(ka/ergon. I'm wondering if the disparity comes from the fact that lemma.py uses Unicode. Perhaps something funny happens when going between the beta code of the text and the unicode of the lemma.py, but I don't know. Then I noticed another disparity between the lemmatization of words with the same beta code in the Hom. Hymns and Iliad:
e(ka/ergos
e(ka/ergos Here, the beta code is the same in both texts (e(ka/ergos), but Hom.Hymn 4.333 produces the lemma mapped in lemma.py (Ἑκάεργος), whereas Iliad 1.479 produces what I'm assuming is the lemma from the backoff_lemmatizer (ἑκάεργος). So, the original problem is still outstanding for the two instances in the Iliad, and I am not sure how to resolve it. The problem doesn't seem to be with the beta code or with lemma.py. |
Seems fixed to me in 7d9583c? Before:
After:
|
Both |
These "words" appear at the top of the expectancy output (because they do not begin with a letter). They look like errors.
ʽἑκάεργος
U+02BD MODIFIER LETTER REVERSED COMMA and̓Ολυμπιιάς
starts with an unattached U+0313 COMBINING COMMA ABOVE.The text was updated successfully, but these errors were encountered: