Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemmas for adverbs and adjectives #20

Open
AngledLuffa opened this issue Dec 31, 2024 · 5 comments
Open

Lemmas for adverbs and adjectives #20

AngledLuffa opened this issue Dec 31, 2024 · 5 comments

Comments

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Dec 31, 2024

I was wondering why the performance of our lemmatizer is so poor on the LinES dev set, and I came across a few classes of differences between the standard used in EWT and GUM vs LinES. One in particular is that of adjectives and adverbs. In EWT and GUM, adjectives and adverbs are lemmatized as their own words, whereas in LinES they are lemmatized as the verb (or adj, in the case of adv) they are related to. A partial list from the dev set is:

merely
fishing
opening
suddenly
briefly
finally
desperately
abruptly
irreducibly
rapidly
carefully
attentively
perfectly
beginning
vaguely
actually
starched
modestly
shocked
gnarled
sugared

It would be great if we could unify this standard. I could send in a PR if that would help

@AngledLuffa
Copy link
Contributor Author

(dev, not test)

@LarsAhrenberg
Copy link
Contributor

Will fix.

@LarsAhrenberg
Copy link
Contributor

Adverbs ending in -ly have new lemmas

@AngledLuffa
Copy link
Contributor Author

Excellent, thank you!

I went looking for -est just to see what happened with that, and I saw that nearest and closest are tagged as ADV. For places, those are generally labeled ADJ in EWT. So for example,

LinES has:

15      the     the     DET     DEF     Definite=Def|PronType=Art       18      det     _       _
16      nearest near    ADV     SPL     _       17      advmod  _       _
17      parallel        parallel        ADJ     POS     Degree=Pos      18      amod    _       _
18      line    line    NOUN    SG-NOM  Number=Sing     13      conj    _       _
19      below   below   ADV     _       _       18      advmod  _       _
20      remains remain  VERB    PRES    Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   0       root    _       _
21      constant        constant        ADJ     POS     Degree=Pos      20      xcomp   _       SpaceAfter=No

whereas EWT will have

en_ewt-ud-train.conllu:# text = Dr. Fortier travelled another three kilometres by foot to seek help in the nearest settlement.
en_ewt-ud-train.conllu:14       nearest near    ADJ     JJS     Degree=Sup      15      amod    15:amod _
en_ewt-ud-train.conllu:# text = If sites next to you don't have what you want, contact your nearest comp.sources.unix archive, or the moderator.
en_ewt-ud-train.conllu:15       nearest near    ADJ     JJS     Degree=Sup      17      amod    17:amod _

@LarsAhrenberg
Copy link
Contributor

Lemmas of ADVs ending in -ly and ADJs ending in -ed or -ing have been corrected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants