llama: add Embeddings for llama #245

danbev · 2023-12-06T15:24:34Z

This commit adds the ability to generate embeddings using the llama.

The motivation for this is to be able to use llama for embeddings in combination with a vector store, like Qdrant.

This commit also adds an example that demonstrates how to use the llm-chain-llama crate for generating embeddings and then use the Qdrant vector store for storing and searching for similar documents.

Example of running simliarity_search_llama:

env LLM_CHAIN_MODEL=~/work/ai/llama.cpp/models/llama-2-7b-chat.Q4_0.gguf cargo r --release --example similarity_search_llama
    Finished release [optimized] target(s) in 0.14s
     Running `/home/danielbevenius/work/ai/llm-chain/target/release/examples/similarity_search_llama`
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from
 ...
llama_new_context_with_model: compute buffer total size = 159.07 MiB
Documents stored under IDs: ["14081b4a-6690-4731-b64e-4058450fa428", "ee0687be-952c-4d30-845e-73807ffe74f1", "fad54a60-51db-4494-bcb8-8c80a12a151c"]
Retrieved stored documents: [Document { page_content: "Sound for the concert was engineered by sound engineer Bill Hanley. \"It worked very well\", he says of the event. \"I built special speaker columns on the hills and had 16 loudspeaker arrays in a square platform going up to the hill on 70-foot [21 m] towers. We set it up for 150,000 to 200,000 people. Of course, 500,000 showed up.\"[48] ALTEC designed marine plywood cabinets that weighed half a ton apiece and stood 6 feet (1.8 m) tall, almost 4 feet (1.2 m) deep, and 3 feet (0.91 m) wide. Each of these enclosures carried four 15-inch (380 mm) JBL D140 loudspeakers. The tweeters consisted of 4×2-Cell & 2×10-Cell Altec Horns. Behind the stage were three transformers providing 2,000 amperes of current to power the amplification setup.[49][page needed] For many years this system was collectively referred to as the Woodstock Bins.[50] The live performances were captured on two 8-track Scully recorders in a tractor trailer back stage by Edwin Kramer and Lee Osbourne on 1-inch Scotch recording tape at 15 ips, then mixed at the Record Plant studio in New York.[51]", metadata: Some(EmptyMetadata) }]

This commit adds the ability to generate embeddings using the llama. The motivation for this is to be able to use llama for embeddings in combination with a vector store, like Qdrant. This commit also adds an example that demonstrates how to use the llm-chain-llama crate for generating embeddings and then use the Qdrant vector store for storing and searching for similar documents. Signed-off-by: Daniel Bevenius <[email protected]>

This commit adds a call to `llama_kv_cache_clear` for each call to `run_model`. This is done because the same sequence id is currently being used for each call to `run_model` which can cause tokens from a previous call to be in the catch. This can cause the model to use tokens from a previous decode call in the attention mechanism which can cause the model to generate incorrect information. Signed-off-by: Daniel Bevenius <[email protected]>

Juzov · 2023-12-17T23:10:13Z

lgtm @williamhogman thoughts?

williamhogman · 2023-12-17T23:12:34Z

Yeah let's merge

danbev added 2 commits December 15, 2023 11:19

danbev force-pushed the embeddings-plus-updated-llama.cpp branch from b27be31 to 5333d50 Compare December 15, 2023 11:34

danbev marked this pull request as ready for review December 15, 2023 12:09

danbev changed the title ~~llama: add Embeddings for llama (wip)~~ llama: add Embeddings for llama Dec 15, 2023

Juzov approved these changes Dec 17, 2023

View reviewed changes

Merge branch 'main' into embeddings-plus-updated-llama.cpp

0e29158

williamhogman merged commit e6e02fb into sobelio:main Dec 17, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: add Embeddings for llama #245

llama: add Embeddings for llama #245

danbev commented Dec 6, 2023 •

edited

Loading

Juzov commented Dec 17, 2023

williamhogman commented Dec 17, 2023

llama: add Embeddings for llama #245

llama: add Embeddings for llama #245

Conversation

danbev commented Dec 6, 2023 • edited Loading

Juzov commented Dec 17, 2023

williamhogman commented Dec 17, 2023

danbev commented Dec 6, 2023 •

edited

Loading