Set embeddings as an input to LLM models #9984

KonstantinSelyuk · 2024-10-21T16:38:52Z

KonstantinSelyuk
Oct 21, 2024

Greetings,

I am seeking assistance regarding the utilization of the llama.cpp repository's API (C++ API) to set input to my Language Model (LLM) model. Specifically, I aim to employ embeddings as input rather than strings or vectors of tokens.

While exploring the examples provided in the llama.cpp repository (https://github.com/ggerganov/llama.cpp/tree/master/examples/simple), I observed that it is straightforward to utilize any LLM model with a text prompt. However, I have been unable to locate an example demonstrating how to set embeddings as input.

Therefore, I kindly request guidance, advice, or code examples from individuals who have encountered a similar challenge. Additionally, if there are any existing GitHub projects that offer similar functionality, please share them as well.

Thank you in advance for your assistance.

Code like:

// model loading
llama_model_params model_params = llama_model_default_params();
llama_model * model = llama_load_model_from_file(model_path.c_str(), model_params);

llama_context_params ctx_params = llama_context_default_params();
llama_context * ctx = llama_new_context_with_model(model, ctx_params);

Exising code: string as input

// input text tokenization (like in simple example)
...
const int n_prompt = -llama_tokenize(model, prompt.c_str(), prompt.size(), NULL, 0, true, true);
std::vector<llama_token> prompt_tokens(n_prompt);
llama_tokenize(model, prompt.c_str(), prompt.size(), prompt_tokens.data(), prompt_tokens.size(), true, true);

// create batch
llama_batch batch = llama_batch_get_one(prompt_tokens.data(), prompt_tokens.size(), 0, 0);

Embeddings as input! - can you help with this?

int tokens_count = 10;
std::vector<float> inputEmbeddings(tokens_count  * 768, 0); // 768 - this is the size of emedding vector for GTP2 model (example)
llama_batch batch =create_batch_using_embeddings(inputEmbeddings); // - how to do this?

// processing the created batch and next generated tokens
...

Answered by slaren

Oct 21, 2024

Yes, you have to initialize the rest of the fields of llama_batch. Unfortunately that's requires a bit more time to explain and I don't think there are any good and simple examples that show how to do it. However, for single sequence use, instead of using llama_batch_init, you should also be able to use this instead:

llama_batch batch = {
    /*n_tokens       =*/ tokens_count,
    /*tokens         =*/ nullptr,
    /*embd           =*/ some_tmp_emb.data(),
    /*pos            =*/ nullptr,
    /*n_seq_id       =*/ nullptr,
    /*seq_id         =*/ nullptr,
    /*logits         =*/ nullptr,
};

This should work in a similar way to llama_batch_get_one, and you don't have to figure the values …

View full answer

slaren · 2024-10-21T16:45:37Z

slaren
Oct 21, 2024
Collaborator

Roughly:

llama_batch batch = llama_batch_init(tokens_count, 768, 1);
float embeddings[768 * tokens_count]; // input data
memcpy(batch.embd, embeddings, 768 * tokens_count * sizeof(float));
// initialize the rest of the batch as usual (minus the tokens)

4 replies

KonstantinSelyuk Oct 21, 2024
Author

@slaren
Thank you for your feedback, I have checked next code:

int tokens_count = 10;
std::vector<float> some_tmp_emb(tokens_count * 768, 0.0f);
llama_batch batch = llama_batch_init(tokens_count, 768, 1);
memcpy(batch.embd, some_tmp_emb.data(), 768 * tokens_count * sizeof(float));
batch.n_tokens = tokens_count;

And it fais in llama_decode(ctx, batch) function: "Segmentation fault (core dumped)" - so possibly some inner values were not correctly initialized, or something else..

slaren Oct 21, 2024
Collaborator

Yes, you have to initialize the rest of the fields of llama_batch. Unfortunately that's requires a bit more time to explain and I don't think there are any good and simple examples that show how to do it. However, for single sequence use, instead of using llama_batch_init, you should also be able to use this instead:

llama_batch batch = {
    /*n_tokens       =*/ tokens_count,
    /*tokens         =*/ nullptr,
    /*embd           =*/ some_tmp_emb.data(),
    /*pos            =*/ nullptr,
    /*n_seq_id       =*/ nullptr,
    /*seq_id         =*/ nullptr,
    /*logits         =*/ nullptr,
};

This should work in a similar way to llama_batch_get_one, and you don't have to figure the values for the rest of the fields.

Answer selected by KonstantinSelyuk

KonstantinSelyuk Oct 28, 2024
Author

@slaren
Thank you for your prompt response. I appreciate your assistance. After implementing your code, I found it to be quite beneficial. However, there is one minor adjustment we need to make during the initialization of llama_batch: we should set the value of all_pos_1 to 1.

Here is the revised section of the code:

int inputTokensCount = 10;            // goes from input 
std::vector<float> prefixEmbeddings;  // goes from input, size = inputTokensCount  * model_embeddings_size; (model_embeddings_size = 768 for GPT2 models)
 
 llama_batch batch = {
     /*n_tokens       =*/ inputTokensCount,
     /*tokens         =*/ nullptr,
     /*embd           =*/ prefixEmbeddings.data(),
     /*pos            =*/ nullptr,
     /*n_seq_id       =*/ nullptr,
     /*seq_id         =*/ nullptr,
     /*logits         =*/ nullptr,
     /*all_pos_0      =*/ 0,
     /*all_pos_1      =*/ 1,
     /*all_seq_id     =*/ 0,
 };
 
 // main loop goes next..

slaren Oct 28, 2024
Collaborator

These fields have been removed from llama_batch recently, so this code would not work with a current version of llama.cpp. If you are using an old version of llama.cpp, as you noted you will also need to take care to set correctly the all_pos_0 and all_pos_1 fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set embeddings as an input to LLM models #9984

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Set embeddings as an input to LLM models #9984

KonstantinSelyuk Oct 21, 2024

Replies: 1 comment · 4 replies

slaren Oct 21, 2024 Collaborator

KonstantinSelyuk Oct 21, 2024 Author

slaren Oct 21, 2024 Collaborator

KonstantinSelyuk Oct 28, 2024 Author

slaren Oct 28, 2024 Collaborator

KonstantinSelyuk
Oct 21, 2024

Replies: 1 comment 4 replies

slaren
Oct 21, 2024
Collaborator

KonstantinSelyuk Oct 21, 2024
Author

slaren Oct 21, 2024
Collaborator

KonstantinSelyuk Oct 28, 2024
Author

slaren Oct 28, 2024
Collaborator