Is `llama_context.logits` a 2d array, and where is it allocated? #9345

shakfu · 2024-09-07T08:10:14Z

shakfu
Sep 7, 2024

struct llama_context {
    // decode output (2-dimensional array: [n_outputs][n_vocab])
    size_t  logits_size = 0; // capacity (of floats) for logits
    float * logits      = nullptr;

I've been trying to understand the logits field in llama_context and how it is allocated. Is it a 2d array of n_output * n_vocab stored as a single contiguous block or just an array of floats? Where in the code is it is allocated?

Also what is the relationship between llama_batch.logits and llama_context.logits?

Answered by compilade

Sep 9, 2024

llama_context.logits is allocated in llama_output_reserve:

llama.cpp/src/llama.cpp

Line 15976 in daa9623

lctx.logits = has_logits ? output_base : nullptr;

Is it a 2d array of n_output * n_vocab stored as a single contiguous block or just an array of floats?

Yes, it's just a buffer of floats. Each "output" has n_vocab logits. They are stored contiguously.

Also what is the relationship between llama_batch.logits and llama_context.logits?

llama_batch.logits is a user-facing API which allows choosing which outputs to calculate. It's an array of bool.

llama_context.logits will only contain the logits for the tokens corresponding to each truthy value in llama_batch.logits.…

View full answer

compilade · 2024-09-09T02:19:47Z

compilade
Sep 9, 2024
Collaborator

llama_context.logits is allocated in llama_output_reserve:

llama.cpp/src/llama.cpp

Line 15976 in daa9623

lctx.logits = has_logits ? output_base : nullptr;

Is it a 2d array of n_output * n_vocab stored as a single contiguous block or just an array of floats?

Yes, it's just a buffer of floats. Each "output" has n_vocab logits. They are stored contiguously.

Also what is the relationship between llama_batch.logits and llama_context.logits?

llama_batch.logits is a user-facing API which allows choosing which outputs to calculate. It's an array of bool.

llama_context.logits will only contain the logits for the tokens corresponding to each truthy value in llama_batch.logits. The logits are stored contiguously since #6122. Before that, llama_batch.logits was always very big (n_vocab * n_batch) and unused logits directly mapped to unused parts of that big buffer.

Some PRs which affected how the logits are handled are #6122 (which added some indirection with llama_context.output_ids) and #8526 (which reorders the logits when the ubatch order doesn't match the batch order (only happens in non-simple batch splits (only for recurrent models for now))).

Let me know if I should clarify further.

1 reply

shakfu Sep 9, 2024
Author

Thanks very much for your answer, @compilade !

shakfu · 2024-10-18T15:55:37Z

shakfu
Oct 18, 2024
Author

@compilade

Incidentally how does one retrieve the n_outputs value from the api? There's no

LLAMA_API int32_t llama_n_outputs(llama_context * ctx) {
    return ctx->n_outputs;
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is `llama_context.logits` a 2d array, and where is it allocated? #9345

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Is llama_context.logits a 2d array, and where is it allocated? #9345

shakfu Sep 7, 2024

Replies: 2 comments · 1 reply

compilade Sep 9, 2024 Collaborator

shakfu Sep 9, 2024 Author

shakfu Oct 18, 2024 Author

Is `llama_context.logits` a 2d array, and where is it allocated? #9345

shakfu
Sep 7, 2024

Replies: 2 comments 1 reply

compilade
Sep 9, 2024
Collaborator

shakfu Sep 9, 2024
Author

shakfu
Oct 18, 2024
Author