Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

em-dash results in POS == '' from biaffine dependency parser #2116

Closed
lpricePARC opened this issue Nov 29, 2018 · 2 comments
Closed

em-dash results in POS == '' from biaffine dependency parser #2116

lpricePARC opened this issue Nov 29, 2018 · 2 comments

Comments

@lpricePARC
Copy link

When using the biaffine dependency parser, the em-dash often (but not always) comes back with pos == ''. This is a problem because I am using the parser to generate labels to turn my text into conll2003 NER format, and I'm ending up with blank spots in my columns which is messing up the ner_crf_tagger's ability to read the files in properly.

Example code:

from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path(
"https://s3-us-west-2.amazonaws.com/allennlp/models/biaffine-dependency-parser-ptb-2018.08.23.tar.gz")
chunk_text = "16 People Print to Share Documents — or Do They?"
ps = predictor.predict(chunk_text)
em_pos = ps["pos"][6]

==> em_pos == ''

Preferred results:

I would prefer to have the em_pos return ":" or "." or even "UNKNOWN", so that I don't get a blank column. Right now, I've added in a pos=='' test before saving to file, so there is a workaround and it's not urgent for me. But I suspect it's causing difficulties for the dependency parser too, given the wide range of values the em-dash is getting assigned.

System info:

ubuntu 14.04 LTS
allennlp= 0.6.1
python version 3.6.5

@DeNeutoy
Copy link
Contributor

Hi! We actually use Spacy to predict the POS tags here - I think you might either 1) Not have a spacy model installed or 2) Have an old model installed. Can you look at this issue and see if it helps you?

explosion/spaCy#1700

@lpricePARC
Copy link
Author

Thank you. I have spacy 2.0.11 installed. I will try 2.0.12.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants