-
struct llama_context {
// decode output (2-dimensional array: [n_outputs][n_vocab])
size_t logits_size = 0; // capacity (of floats) for logits
float * logits = nullptr; I've been trying to understand the Also what is the relationship between llama_batch.logits and llama_context.logits? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Line 15976 in daa9623
Yes, it's just a buffer of floats. Each "output" has
Some PRs which affected how the logits are handled are #6122 (which added some indirection with Let me know if I should clarify further. |
Beta Was this translation helpful? Give feedback.
-
Incidentally how does one retrieve the LLAMA_API int32_t llama_n_outputs(llama_context * ctx) {
return ctx->n_outputs;
} |
Beta Was this translation helpful? Give feedback.
llama_context.logits
is allocated inllama_output_reserve
:llama.cpp/src/llama.cpp
Line 15976 in daa9623
Yes, it's just a buffer of floats. Each "output" has
n_vocab
logits. They are stored contiguously.llama_batch.logits
is a user-facing API which allows choosing which outputs to calculate. It's an array ofbool
.llama_context.logits
will only contain the logits for the tokens corresponding to each truthy value inllama_batch.logits
.…