How to get the number of tokens of the prompt? #1873

enzomich · 2024-12-19T11:09:07Z

enzomich
Dec 19, 2024

Hello everybody,
I'm new to llama-cpp-python, and I'm trying to understand how to get some statistics such as speed in tokens/second (prompt analysis and generation) after a call to llm.create_completion(...). I could measure the times by myself with time.perf_counter(), and at least in streaming mode get the number of generated tokens by counting the chunks yield()'d by create_completion(), but I don't know how to get the number of tokens of the prompt.
Inspecting the source code of Llama.py, I've found that in one case (at line 1718) _create_completion() yields a dict containing an item with key "usage" that is a dictionary containing the lengths of prompt_tokens[], completion_tokens[] and the sum of the two, but it's not clear to me how to have that yield() used and why usage is not present in the other calls to yield() (at least for the last chunk yielded: that's what the ollama-python library does, so it shouldn't be difficult).
Any idea of any other way of knowing the number of tokens of the prompt?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the number of tokens of the prompt? #1873

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to get the number of tokens of the prompt? #1873

enzomich Dec 19, 2024

Replies: 0 comments

enzomich
Dec 19, 2024