Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E892] Unknown function registry: 'llm_backends' #12987

Closed
rkatriel opened this issue Sep 18, 2023 · 18 comments
Closed

[E892] Unknown function registry: 'llm_backends' #12987

rkatriel opened this issue Sep 18, 2023 · 18 comments
Labels
docs Documentation and website feat/llm Feature: LLMs (incl. spacy-llm) usage General spaCy usage

Comments

@rkatriel
Copy link

rkatriel commented Sep 18, 2023

How to reproduce the behaviour

I'm getting an "Unknown function registry: 'llm_backends'" error (see the traceback below) when running the example provided in Matthew Honnibal's blog "Against LLM maximalism" (https://explosion.ai/blog/against-llm-maximalism)

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "backend": {
            "@llm_backends": "spacy.REST.v1",
            "api": "OpenAI",
            "config": {"model": "text-davinci-003"},
        },
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

Here is the full traceback:

File "/Users/ron.katriel/PycharmProjects/NLP/spacy-llm-example.py", line 5, in
nlp.add_pipe(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 786, in add_pipe
pipe_component = self.create_pipe(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 679, in create_pipe
resolved = registry.resolve(cfg, validate=validate)
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 756, in resolve
resolved, _ = cls._make(
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 805, in _make
filled, _, resolved = cls._fill(
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 860, in _fill
filled[key], validation[v_key], final[key] = cls._fill(
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 859, in _fill
promise_schema = cls.make_promise_schema(value, resolve=resolve)
File "/Users/ron.katriel/PycharmProjects/Labs-Gen-AI/venv/lib/python3.10/site-packages/confection/init.py", line 1051, in make_promise_schema
func = cls.get(reg_name, func_name)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 128, in get
raise RegistryError(Errors.E892.format(name=registry_name, available=names))
catalogue.RegistryError: [E892] Unknown function registry: 'llm_backends'.

Available names: architectures, augmenters, batchers, callbacks, cli, datasets, displacy_colors, factories, initializers, languages, layers, lemmatizers, llm_misc, llm_models, llm_queries, llm_tasks, loggers, lookups, losses, misc, models, ops, optimizers, readers, schedules, scorers, tokenizers

Your Environment

  • spaCy version: 3.5.1
  • Platform: macOS-13.5.2-x86_64-i386-64bit
  • Python version: 3.10.4
@danieldk danieldk added usage General spaCy usage feat/llm Feature: LLMs (incl. spacy-llm) labels Sep 19, 2023
@rmitsch rmitsch added the docs Documentation and website label Oct 16, 2023
@rmitsch
Copy link
Contributor

rmitsch commented Oct 16, 2023

Sorry for not getting back to you earlier, this one fell through the cracks! The example in the blog is outdated, the API looks a bit different now. We'll update the blog soon. The correct way to initialize this with spacy-llm >= 0.4.0 looks like this:

nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {"@llm_models": "spacy.Davinci.v2"},
    },
)

@ines ines closed this as completed Oct 17, 2023
@rkatriel
Copy link
Author

rkatriel commented Oct 17, 2023

@rmitsch Hi Raphael,

I tried your suggestion - after upgrading spacy and spacy-llm to the latest versions (3.7.2 and 0.6.2, respectively) - but now I'm getting a Config validation error. See the console trace below.

Thanks,
Ron

nlp.add_pipe(

File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 821, in add_pipe
pipe_component = self.create_pipe(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 709, in create_pipe
resolved = registry.resolve(cfg, validate=validate)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 756, in resolve
resolved, _ = cls._make(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 805, in _make
filled, _, resolved = cls._fill(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 860, in _fill
filled[key], validation[v_key], final[key] = cls._fill(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 860, in _fill
filled[key], validation[v_key], final[key] = cls._fill(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/confection/init.py", line 926, in _fill
raise ConfigValidationError(
confection.ConfigValidationError:

Config validation error
llm.model -> llm_models extra fields not permitted
{'llm_models': 'spacy.Davinci.v2', '@llm_models': 'spacy.GPT-3-5.v2', 'strict': True}

@rmitsch
Copy link
Contributor

rmitsch commented Oct 17, 2023

Can you share the config you're using?

@rkatriel
Copy link
Author

I have no config file. Below is the code I'm running. The parameters are passed in the code as recommended.

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {"llm_models": "spacy.Davinci.v2"},
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

@rmitsch
Copy link
Contributor

rmitsch commented Oct 17, 2023

Ah, I forgot to an "@" in the example I've given above. Try again with this:

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {"@llm_models": "spacy.Davinci.v2"},
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

@rkatriel
Copy link
Author

rkatriel commented Oct 17, 2023

Thanks, that did the trick! But now I'm getting a connection error

ConnectionError: API could not be reached after 34.596 seconds in total and attempting to connect 5 times. Check your network connection and the API's availability.
429 Too Many Requests

This is likely from OpenAI because my account is not a paid one.

Is there an open source (e.g., Huggingface) model that works with this setup? I tried running the script with 'spacy.OpenLLaMA.v1' and got the following error

Config validation error
llm.model -> name field required
{'@llm_models': 'spacy.OpenLLaMA.v1'}

@rmitsch
Copy link
Contributor

rmitsch commented Oct 18, 2023

The ConnectionError usually is from the OpenAI rate-limiting you, yes. You could also increase the time between tries, but that's also unsatisfying.

OS models work the same way. Hugging Face models also appear in variations, and we don't select one by default (maybe we should). Anyway, have a look at the documentation to see which ones are available. You could go with the 3B one e. g. and do

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe(
    "llm",
    config={
        "task": {
            "@llm_tasks": "spacy.NER.v1",
            "labels": "SAAS_PLATFORM,PROGRAMMING_LANGUAGE,OPEN_SOURCE_LIBRARY"
        },
        "model": {
            "@llm_models": "spacy.OpenLLaMa.v2",
            "name": "open_llama_3b"
        },
    },
)

doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.sent)

Note: OpenLLaMa is an older model, and the 3B model is small. You'll probably won't get amazing results out of using it.

@rkatriel
Copy link
Author

rkatriel commented Oct 18, 2023

Thanks Raphael, but that doesn't work. I'm get the following catalogue/registry error:

catalogue.RegistryError: [E893] Could not find function 'spacy.OpenLLaMa.v2' in function registry 'llm_models'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

Changing to 'spacy.OpenLLaMa.v1', as implied in the rest of the error message below, does not help.

Available names: langchain.AI21.v1, langchain.AlephAlpha.v1, langchain.Anthropic.v1, langchain.Anyscale.v1, langchain.Aviary.v1, langchain.AzureOpenAI.v1, langchain.Banana.v1, langchain.Beam.v1, langchain.CTransformers.v1, langchain.CerebriumAI.v1, langchain.Cohere.v1, langchain.Databricks.v1, langchain.DeepInfra.v1, langchain.FakeListLLM.v1, langchain.ForefrontAI.v1, langchain.GPT4All.v1, langchain.GooglePalm.v1, langchain.GooseAI.v1, langchain.HuggingFaceEndpoint.v1, langchain.HuggingFaceHub.v1, langchain.HuggingFacePipeline.v1, langchain.HuggingFaceTextGenInference.v1, langchain.HumanInputLLM.v1, langchain.LlamaCpp.v1, langchain.Modal.v1, langchain.MosaicML.v1, langchain.NLPCloud.v1, langchain.OpenAI.v1, langchain.OpenLM.v1, langchain.Petals.v1, langchain.PipelineAI.v1, langchain.RWKV.v1, langchain.Replicate.v1, langchain.SagemakerEndpoint.v1, langchain.SelfHostedHuggingFaceLLM.v1, langchain.SelfHostedPipeline.v1, langchain.StochasticAI.v1, langchain.VertexAI.v1, langchain.Writer.v1, spacy.Ada.v1, spacy.Ada.v2, spacy.Azure.v1, spacy.Babbage.v1, spacy.Babbage.v2, spacy.Claude-1-0.v1, spacy.Claude-1-2.v1, spacy.Claude-1-3.v1, spacy.Claude-1.v1, spacy.Claude-2.v1, spacy.Claude-instant-1-1.v1, spacy.Claude-instant-1.v1, spacy.Code-Davinci.v1, spacy.Code-Davinci.v2, spacy.Command.v1, spacy.Curie.v1, spacy.Curie.v2, spacy.Davinci.v1, spacy.Davinci.v2, spacy.Dolly.v1, spacy.Falcon.v1, spacy.GPT-3-5.v1, spacy.GPT-3-5.v2, spacy.GPT-4.v1, spacy.GPT-4.v2, spacy.Llama2.v1, spacy.Mistral.v1, spacy.NoOp.v1, spacy.OpenLLaMA.v1, spacy.PaLM.v1, spacy.StableLM.v1, spacy.Text-Ada.v1, spacy.Text-Ada.v2, spacy.Text-Babbage.v1, spacy.Text-Babbage.v2, spacy.Text-Curie.v1, spacy.Text-Curie.v2, spacy.Text-Davinci.v1, spacy.Text-Davinci.v2

@rmitsch
Copy link
Contributor

rmitsch commented Oct 18, 2023

A typo on my end, use spacy.OpenLLaMa.v1 instead of spacy.OpenLLaMa.v2.

@rkatriel
Copy link
Author

Already tried that, as mentioned above. Same type of error

catalogue.RegistryError: [E893] Could not find function 'spacy.OpenLLaMa.v1' in function registry 'llm_models'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

@rmitsch
Copy link
Contributor

rmitsch commented Oct 19, 2023

Argh, these different Llama casings always get me. So the correct spelling is spacy.OpenLLaMA.v1, not spacy.OpenLLaMa.v1 (notice that the last "a" is uppercase). Apologies for not double-checking.

@rkatriel
Copy link
Author

rkatriel commented Oct 19, 2023

Thanks, Raphael! That did the trick, though after fixing it I got a new error:

Tokenizer class LlamaTokenizer does not exist or is not currently imported.

It turns out this is a known issue and is solved by uninstalling/reinstalling the transformers library.

So now we're past the loading of the model but not out of the woods. I'm getting the following error when calling Spacy's nlp() function with the query shown in the code above (see the full traceback below):

RuntimeError: Placeholder storage has not been allocated on MPS device!

(I thought this could be an issue with Intel vs. Apple silicon but I'm getting the same error on a MacBook with the M2 chip)

Any thoughts on how to resolve this?

Ron

Traceback (most recent call last):
  File "/Users/ron.katriel/PycharmProjects/Transformer/test-spacy-llm.py", line 19, in <module>
    doc = nlp("There's no PyTorch bindings for Go. We just use Microsoft Cognitive Services.")
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1054, in __call__
    error_handler(name, proc, [doc], e)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/util.py", line 1704, in raise_error
    raise e
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy/language.py", line 1049, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 156, in __call__
    docs = self._process_docs([doc])
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/pipeline/llm.py", line 210, in _process_docs
    responses_iters = tee(self._model(prompts_iters[0]), n_iters)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 55, in __call__
    return [
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/spacy_llm/models/hf/openllama.py", line 57, in <listcomp>
    self._model.generate(input_ids=tii, **self._config_run)[
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
    return self.greedy_search(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
    outputs = self(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
    outputs = self.model(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 875, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

@rmitsch
Copy link
Contributor

rmitsch commented Oct 30, 2023

Huh, that's odd. You're getting this error when running exactly this snippet?

@rkatriel
Copy link
Author

Correct - except spacy.OpenLLaMA.v1 instead of spacy.OpenLLaMa.v1, as you suggested above.

@rmitsch
Copy link
Contributor

rmitsch commented Oct 31, 2023

Which machine are you running this one? We'd like to try replicating this.

@rmitsch
Copy link
Contributor

rmitsch commented Oct 31, 2023

Also, I'd appreciate if you opened a new issue for this problem. Might be useful for other users 🙏

@rkatriel
Copy link
Author

rkatriel commented Oct 31, 2023

Done! The new issue is "Spacy-LLM fails with storage not allocated on MPS device" #13096

Copy link
Contributor

github-actions bot commented Dec 1, 2023

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website feat/llm Feature: LLMs (incl. spacy-llm) usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

3 participants