Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch support #726

Closed
thistleknot opened this issue Sep 17, 2023 · 2 comments
Closed

batch support #726

thistleknot opened this issue Sep 17, 2023 · 2 comments

Comments

@thistleknot
Copy link

thistleknot commented Sep 17, 2023

import openai

#openai.api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # can be anything
#openai.api_base = "http://127.0.0.1:8000/v1"

prompts = ["The quick brown fox jumps","Jack and Jill went up the hill","What is the meaning of life?","Tell me a story."]

openai.Completion.create(
    model="text-davinci-003", # currently can be anything
    prompt=prompts,
    max_tokens=256,
)

works with openai's api endpoint, but when attempting to use with llama-cpp-python's api endpoint

>>> openai.Completion.create(
...     model="text-davinci-003", # currently can be anything
...     prompt=prompts[0],
...     max_tokens=256,
... )

<OpenAIObject text_completion id=cmpl-c499f5b0-ca80-42ca-bf65-a94248c4e8f9 at 0x7ff1d463e430> JSON: {
  "id": "cmpl-c499f5b0-ca80-42ca-bf65-a94248c4e8f9",
  "object": "text_completion",
  "created": 1694965488,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " over the lazy dog. Begriffe werden oft verwendet, um eine bestimmteFunktion oder Eigenschaft zu beschreiben. But when you're learning a new language, it can be hard to keep track of all these different terms and their meanings. So what does it all mean? Let's break it down:\nQuick - This adjective means moving or acting quickly. For example, \"The quick rabbit ran across the field.\"\nBrown - In English, this adjective is used to describe something that has a brown color. For example, \"The brown dog wagged its tail with excitement.\"\nFox - This noun refers to a small, sly animal known for its cunning and stealth. For example, \"The fox sneaked into the henhouse to find an easy meal.\"\nJumps - This verb means to jump or leap in a sudden and energetic manner. For example, \"The kangaroo jumps over the obstacles in its path.\"\nLazy - This adjective means not very active or energetic. For example, \"The lazy cat slept for most of the day.\"\nDog - This noun refers to a domestic",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 256,
    "total_tokens": 264
  }
}
>>>
>>> openai.Completion.create(
...     model="text-davinci-003", # currently can be anything
...     prompt=prompts,
...     max_tokens=256,
... )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_resources/completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response
    self._interpret_response_line(
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line
    raise self.handle_error_response(
openai.error.APIError:  {"error":{"message":"","type":"internal_server_error","param":null,"code":null}} 500 {'error': {'message': '', 'type': 'internal_server_error', 'param': None, 'code': None}} {'date': 'Sun, 17 Sep 2023 15:45:21 GMT', 'server': 'uvicorn', 'content-length': '80', 'content-type': 'application/json', 'x-request-id': '0ef74538f50442859a7a4dabf904959a'}

I can use a single prompt, but not prompts.

@thistleknot
Copy link
Author

when using the llm object

I don't see the completions functions support batching. It errors out on encode
'''

output = llm.create_completion(prompts, max_tokens=256, stop=["Q:", "\n"], echo=True)
Traceback (most recent call last):
File "", line 1, in
File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 1404, in create_completion
completion: Completion = next(completion_or_chunks) # type: ignore
File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 901, in _create_completion
prompt_tokens: List[int] = self.tokenize(prompt.encode("utf-8")) if prompt != "" else [self.token_bos()]
AttributeError: 'list' object has no attribute 'encode'
'''

however, this same function in openai does allow batching (but if you attempt it with the server v of llama.cpp openai proxy spoof, it will error out).

@abetlen
Copy link
Owner

abetlen commented Nov 21, 2023

Just cleaning up old issues and closing this in favour of #771

@abetlen abetlen closed this as completed Nov 21, 2023
@abetlen abetlen reopened this Nov 21, 2023
@abetlen abetlen closed this as not planned Won't fix, can't repro, duplicate, stale Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants