batch support #726

thistleknot · 2023-09-17T15:43:40Z

import openai

#openai.api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # can be anything
#openai.api_base = "http://127.0.0.1:8000/v1"

prompts = ["The quick brown fox jumps","Jack and Jill went up the hill","What is the meaning of life?","Tell me a story."]

openai.Completion.create(
    model="text-davinci-003", # currently can be anything
    prompt=prompts,
    max_tokens=256,
)

works with openai's api endpoint, but when attempting to use with llama-cpp-python's api endpoint

>>> openai.Completion.create(
...     model="text-davinci-003", # currently can be anything
...     prompt=prompts[0],
...     max_tokens=256,
... )

<OpenAIObject text_completion id=cmpl-c499f5b0-ca80-42ca-bf65-a94248c4e8f9 at 0x7ff1d463e430> JSON: {
  "id": "cmpl-c499f5b0-ca80-42ca-bf65-a94248c4e8f9",
  "object": "text_completion",
  "created": 1694965488,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " over the lazy dog. Begriffe werden oft verwendet, um eine bestimmteFunktion oder Eigenschaft zu beschreiben. But when you're learning a new language, it can be hard to keep track of all these different terms and their meanings. So what does it all mean? Let's break it down:\nQuick - This adjective means moving or acting quickly. For example, \"The quick rabbit ran across the field.\"\nBrown - In English, this adjective is used to describe something that has a brown color. For example, \"The brown dog wagged its tail with excitement.\"\nFox - This noun refers to a small, sly animal known for its cunning and stealth. For example, \"The fox sneaked into the henhouse to find an easy meal.\"\nJumps - This verb means to jump or leap in a sudden and energetic manner. For example, \"The kangaroo jumps over the obstacles in its path.\"\nLazy - This adjective means not very active or energetic. For example, \"The lazy cat slept for most of the day.\"\nDog - This noun refers to a domestic",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 256,
    "total_tokens": 264
  }
}
>>>
>>> openai.Completion.create(
...     model="text-davinci-003", # currently can be anything
...     prompt=prompts,
...     max_tokens=256,
... )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_resources/completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_requestor.py", line 298, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_requestor.py", line 700, in _interpret_response
    self._interpret_response_line(
  File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line
    raise self.handle_error_response(
openai.error.APIError:  {"error":{"message":"","type":"internal_server_error","param":null,"code":null}} 500 {'error': {'message': '', 'type': 'internal_server_error', 'param': None, 'code': None}} {'date': 'Sun, 17 Sep 2023 15:45:21 GMT', 'server': 'uvicorn', 'content-length': '80', 'content-type': 'application/json', 'x-request-id': '0ef74538f50442859a7a4dabf904959a'}

I can use a single prompt, but not prompts.

The text was updated successfully, but these errors were encountered:

thistleknot · 2023-09-17T17:39:17Z

when using the llm object

I don't see the completions functions support batching. It errors out on encode
'''

output = llm.create_completion(prompts, max_tokens=256, stop=["Q:", "\n"], echo=True)
Traceback (most recent call last):
File "", line 1, in
File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 1404, in create_completion
completion: Completion = next(completion_or_chunks) # type: ignore
File "/data/ubuntu_22.04_sandbox/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 901, in _create_completion
prompt_tokens: List[int] = self.tokenize(prompt.encode("utf-8")) if prompt != "" else [self.token_bos()]
AttributeError: 'list' object has no attribute 'encode'
'''

however, this same function in openai does allow batching (but if you attempt it with the server v of llama.cpp openai proxy spoof, it will error out).

abetlen · 2023-11-21T09:34:58Z

Just cleaning up old issues and closing this in favour of #771

abetlen mentioned this issue Nov 21, 2023

Is there a possible memory leak in llama_cpp.llama_decode()? #924

Closed

4 tasks

abetlen closed this as completed Nov 21, 2023

abetlen reopened this Nov 21, 2023

abetlen closed this as not planned Won't fix, can't repro, duplicate, stale Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch support #726

batch support #726

thistleknot commented Sep 17, 2023 •

edited

Loading

thistleknot commented Sep 17, 2023

abetlen commented Nov 21, 2023

batch support #726

batch support #726

Comments

thistleknot commented Sep 17, 2023 • edited Loading

thistleknot commented Sep 17, 2023

abetlen commented Nov 21, 2023

thistleknot commented Sep 17, 2023 •

edited

Loading