Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect lemmatization for German at beginning of sentence #2465

Closed
karelin opened this issue Jun 19, 2018 · 3 comments
Closed

Incorrect lemmatization for German at beginning of sentence #2465

karelin opened this issue Jun 19, 2018 · 3 comments
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / de German language data and models

Comments

@karelin
Copy link

karelin commented Jun 19, 2018

How to reproduce the behaviour

import spacy
nlp = spacy.load('de')
doc = nlp('Gemeinsame Erklärung von Singapur: Was genau steht drin?')
[(t.lemma_, t.pos_) for t in doc][0]
[('Gemeinsame', 'ADJ')]

BUG: expected lemma is gemeinsam, this lemma is produced e.g. in sentence

US-Präsident Donald Trump und Nordkoreas Machthaber Kim Jong Un haben bei ihrem Gipfel in Singapur eine gemeinsame Erklärung unterzeichnet.

Note, in contrast with #2166, the POS tag is correct (for both sentences).

Info about spaCy

  • spaCy version: 2.0.11
  • Platform: Windows-10-10.0.17134-SP0
  • Python version: 3.6.5
  • Models: de, en, xx
@ines ines added performance lang / de German language data and models labels Jun 21, 2018
@ines
Copy link
Member

ines commented Jun 21, 2018

Thanks! This is likely related to the case-sensitivity and possibly an underlying bug in the lemmatizer (see #2368). Will investigate – we definitely want to sort this out for v2.1.x.

@ines ines added the feat / lemmatizer Feature: Rule-based and lookup lemmatization label Jul 2, 2018
@ines
Copy link
Member

ines commented Jul 6, 2018

Merging this with #2486!

@ines ines closed this as completed Jul 6, 2018
@lock
Copy link

lock bot commented Aug 5, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / de German language data and models
Projects
None yet
Development

No branches or pull requests

2 participants