Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacy model version issues? #14

Open
agolo-alan-hogue opened this issue Nov 17, 2023 · 2 comments
Open

Spacy model version issues? #14

agolo-alan-hogue opened this issue Nov 17, 2023 · 2 comments

Comments

@agolo-alan-hogue
Copy link

agolo-alan-hogue commented Nov 17, 2023

Hi!

I am not sure, but I think there might be model version issues with Spacy.

First, the easy part: the data directory vars in setup are messed up. I fixed those manually.

Then, I get the error below. This is not using docker. I have tried everything I can think of to get docker running on my Mac with zero luck, so this is currently not an option. Some searches suggest that similar problems have arisen from mismatch of spacy versions, but I have tried loading different model versions and such without any luck thus far. These are what I have (output of pip freeze):

spacy==3.4.4
spacy-conll==3.3.0
spacy-legacy==3.0.12
spacy-loggers==1.0.5

The error:

python3 link_benchmark_entities.py Spacy -l spacy -b agolo-110823
2023-11-17 11:40:32 [INFO]: Loading config file configs/spacy.config.json for linker spacy.
2023-11-17 11:40:32 [INFO]: Initializing linker spacy with config parameters {'linker_name': 'Spacy', 'model_name': 'wikipedia', 'kb': 'wikipedia', 'experiment_description': 'Using a knowledge base and model derived from Wikipedia.'} ...
2023-11-17 11:40:35 [INFO]: Loading linker model...
Traceback (most recent call last):
  File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 164, in <module>
    main(cmdl_args)
  File "/Users/alan/repos/agolo/elevant/link_benchmark_entities.py", line 40, in main
    linking_system = LinkingSystem(args.linker_name,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 43, in __init__
    self._initialize_linker(linker_name, prediction_file, prediction_format)
  File "/Users/alan/repos/agolo/elevant/src/linkers/linking_system.py", line 97, in _initialize_linker
    self.linker = SpacyLinker(self.linker_config)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/linkers/spacy_linker.py", line 23, in __init__
    self.model = EntityLinkerLoader.load_trained_linker(model_name, kb_name=kb_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alan/repos/agolo/elevant/src/helpers/entity_linker_loader.py", line 28, in load_trained_linker
    model.from_bytes(model_bytes)
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/language.py", line 2202, in from_bytes
    util.from_bytes(bytes_data, deserializers, exclude)
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/util.py", line 1302, in from_bytes
    return from_dict(srsly.msgpack_loads(bytes_data), setters, exclude)  # type: ignore[return-value]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/util.py", line 1324, in from_dict
    setter(msg[key])
  File "/opt/homebrew/lib/python3.11/site-packages/spacy/language.py", line 2191, in <lambda>
    deserializers["tokenizer"] = lambda b: self.tokenizer.from_bytes(  # type: ignore[union-attr]
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "spacy/tokenizer.pyx", line 838, in spacy.tokenizer.Tokenizer.from_bytes
  File "spacy/tokenizer.pyx", line 127, in spacy.tokenizer.Tokenizer.rules.__set__
  File "spacy/tokenizer.pyx", line 574, in spacy.tokenizer.Tokenizer._load_special_cases
  File "spacy/tokenizer.pyx", line 604, in spacy.tokenizer.Tokenizer.add_special_case
  File "spacy/tokenizer.pyx", line 592, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for '  '. Tokenizer exceptions are only allowed to specify ORTH and NORM.
@agolo-alan-hogue
Copy link
Author

Oh, I forgot to mention, before that I got this error, which I fixed the following way:

src/helpers/entity_linker_loader.py

pipeline = ['tagger', 'parser', 'ner', 'entity_linker']
        for pipe_name in pipeline:
            # pipe = model.create_pipe(pipe_name)
            model.add_pipe(pipe_name)

Error (truncated):

ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy.pipeline.tagger.Tagger object at 0x2b30698b0> (name: 'None').

- If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead.

- If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`.

- If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline.

@flackbash
Copy link
Member

Thanks for reporting this. This should be a model version issues. I believe the models were trained with spaCy 2 still.

We had not tested the spaCy linker recently, since frankly the results were not very convincing.
I'll try to retrain the model but this was also done by a colleague a few years ago, the code is still based on spaCy 2 and there seem to have been plenty of changes in spaCy 3 that affect this code. I'll let you know if I make any progress.

flackbash added a commit that referenced this issue Nov 30, 2023
the old code was not running anymore for the new version.
There must be a problem with the new code as well however, since the
training loss is always 0. I was unable to figure out why until now.
I followed the instructions in the official spaCy entity linking
tutorial here:
https://github.com/explosion/projects/blob/v3/tutorials/nel_emerson/notebooks/notebook_video.ipynb

Relevant to #14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants