Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama: add Embeddings for llama #245

Merged
merged 3 commits into from
Dec 17, 2023

Conversation

danbev
Copy link
Contributor

@danbev danbev commented Dec 6, 2023

This commit adds the ability to generate embeddings using the llama.

The motivation for this is to be able to use llama for embeddings in combination with a vector store, like Qdrant.

This commit also adds an example that demonstrates how to use the llm-chain-llama crate for generating embeddings and then use the Qdrant vector store for storing and searching for similar documents.


Example of running simliarity_search_llama:

env LLM_CHAIN_MODEL=~/work/ai/llama.cpp/models/llama-2-7b-chat.Q4_0.gguf cargo r --release --example similarity_search_llama
    Finished release [optimized] target(s) in 0.14s
     Running `/home/danielbevenius/work/ai/llm-chain/target/release/examples/similarity_search_llama`
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from
 ...
llama_new_context_with_model: compute buffer total size = 159.07 MiB
Documents stored under IDs: ["14081b4a-6690-4731-b64e-4058450fa428", "ee0687be-952c-4d30-845e-73807ffe74f1", "fad54a60-51db-4494-bcb8-8c80a12a151c"]
Retrieved stored documents: [Document { page_content: "Sound for the concert was engineered by sound engineer Bill Hanley. \"It worked very well\", he says of the event. \"I built special speaker columns on the hills and had 16 loudspeaker arrays in a square platform going up to the hill on 70-foot [21 m] towers. We set it up for 150,000 to 200,000 people. Of course, 500,000 showed up.\"[48] ALTEC designed marine plywood cabinets that weighed half a ton apiece and stood 6 feet (1.8 m) tall, almost 4 feet (1.2 m) deep, and 3 feet (0.91 m) wide. Each of these enclosures carried four 15-inch (380 mm) JBL D140 loudspeakers. The tweeters consisted of 4×2-Cell & 2×10-Cell Altec Horns. Behind the stage were three transformers providing 2,000 amperes of current to power the amplification setup.[49][page needed] For many years this system was collectively referred to as the Woodstock Bins.[50] The live performances were captured on two 8-track Scully recorders in a tractor trailer back stage by Edwin Kramer and Lee Osbourne on 1-inch Scotch recording tape at 15 ips, then mixed at the Record Plant studio in New York.[51]", metadata: Some(EmptyMetadata) }]

This commit adds the ability to generate embeddings using the llama.

The motivation for this is to be able to use llama for embeddings in
combination with a vector store, like Qdrant.

This commit also adds an example that demonstrates how to use the
llm-chain-llama crate for generating embeddings and then use
the Qdrant vector store for storing and searching for similar
documents.

Signed-off-by: Daniel Bevenius <[email protected]>
This commit adds a call to `llama_kv_cache_clear` for each call to
`run_model`. This is done because the same sequence id is currently
being used for each call to `run_model` which can cause tokens from a
previous call to be in the catch. This can cause the model to use tokens
from a previous decode call in the attention mechanism which can cause
the model to generate incorrect information.

Signed-off-by: Daniel Bevenius <[email protected]>
@danbev danbev force-pushed the embeddings-plus-updated-llama.cpp branch from b27be31 to 5333d50 Compare December 15, 2023 11:34
@danbev danbev marked this pull request as ready for review December 15, 2023 12:09
@danbev danbev changed the title llama: add Embeddings for llama (wip) llama: add Embeddings for llama Dec 15, 2023
@Juzov
Copy link
Collaborator

Juzov commented Dec 17, 2023

lgtm @williamhogman thoughts?

@williamhogman
Copy link
Contributor

Yeah let's merge

@williamhogman williamhogman merged commit e6e02fb into sobelio:main Dec 17, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants