Set embeddings as an input to LLM models #9984
-
Greetings, I am seeking assistance regarding the utilization of the llama.cpp repository's API (C++ API) to set input to my Language Model (LLM) model. Specifically, I aim to employ embeddings as input rather than strings or vectors of tokens. While exploring the examples provided in the llama.cpp repository (https://github.com/ggerganov/llama.cpp/tree/master/examples/simple), I observed that it is straightforward to utilize any LLM model with a text prompt. However, I have been unable to locate an example demonstrating how to set embeddings as input. Therefore, I kindly request guidance, advice, or code examples from individuals who have encountered a similar challenge. Additionally, if there are any existing GitHub projects that offer similar functionality, please share them as well. Thank you in advance for your assistance. Code like:
Exising code: string as input
Embeddings as input! - can you help with this?
// processing the created batch and next generated tokens |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Roughly: llama_batch batch = llama_batch_init(tokens_count, 768, 1);
float embeddings[768 * tokens_count]; // input data
memcpy(batch.embd, embeddings, 768 * tokens_count * sizeof(float));
// initialize the rest of the batch as usual (minus the tokens) |
Beta Was this translation helpful? Give feedback.
Yes, you have to initialize the rest of the fields of
llama_batch
. Unfortunately that's requires a bit more time to explain and I don't think there are any good and simple examples that show how to do it. However, for single sequence use, instead of usingllama_batch_init
, you should also be able to use this instead:This should work in a similar way to
llama_batch_get_one
, and you don't have to figure the values …