You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that sentences containing a single number, e.g. "2", get an incorrect POS tag 'PUNCT' instead of the 'NUM' tag.
import spacy
nlp = spacy.load("en")
text = nlp("2")
for token in text:
print(token.text, token.pos_)
>> (u'2', u'PUNCT')
Note that in other languages, such as Dutch or French, the POS gives the correct tag. Furthermore, once the number is larger than 6, the POS tag switches to 'NUM'.
Following versions:
Python version 2.7.14
Platform Linux-4.13.0-39-generic-x86_64-with-Ubuntu-17.10-artful
spaCy version 2.0.9
The text was updated successfully, but these errors were encountered:
Merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.
In this case, it's also important to note that the part-of-speech tagger generally expects sentences or at least more than one token to really predict the part-of-speech tag based on the context. So it makes sense that it struggles if there's no context around the word.
As a workaround, you could add a custom pipeline component that check whether the token consists of digits (token.is_digit) and then automatically overrides the token.tag_ with the respective fine-grained part-of-speech tag of the given language. If the fine-grained tag is changed, the coarse-grained tag (token.pos_) will adjust accordingly, based on the language's tag map.
I found that sentences containing a single number, e.g. "2", get an incorrect POS tag
'PUNCT'
instead of the'NUM
' tag.>> (u'2', u'PUNCT')
Note that in other languages, such as Dutch or French, the POS gives the correct tag. Furthermore, once the number is larger than 6, the POS tag switches to
'NUM'
.Following versions:
Python version 2.7.14
Platform Linux-4.13.0-39-generic-x86_64-with-Ubuntu-17.10-artful
spaCy version 2.0.9
The text was updated successfully, but these errors were encountered: