How would you implement RAG / Document chat? #36

flatsiedatsie · 2024-05-14T22:01:55Z

In your readme you mention:

Maybe doing a full RAG-in-browser example using tinyllama?

I've been looking into a way to allow users to 'chat with their documents'. A popular concept. Specifically I was looking into 'Fully local PDF chatbot'. It seems.. complicated.

So I was wondering: if one wanted to implement this feature using Wllama, what are the 'components' of such a solution?

Would it be something like...

Wllama's embedding feature turns text chunks into vector objects?
Those could then be stored in Voy?
magic
magic
The user gets an answer to their question, e.g. "The sun is 2948520 degrees, which I found on page 16"?

What would the steps actually be?

ngxson · 2024-05-16T09:50:36Z

A classic RAG system consist of a vector database + a generative model. With wllama, this can be archived by:

Embedding model. However, I still couldn't find a good embedding model.
For database, we can use Voy as you mentioned, or HNWS which is a pure JS implementation (we don't need to much performance on this part, our database is relatively small anyway)
A good generative model that does not hallucinate. This is very important and requires using specific model, for example Llama3-ChatQA-1.5-8B by nvidia. These models are generally "dumb" or "stupid", but will be safe because they don't make up information if it is not found in our RAG

Another idea that is only possible if your document is short and predefined, is to construct a session and reuse it later (via sessionSave and sessionLoad) - This is useful in my case for example, if the chatbot is purely to introduce a specific website, we don't even need to make a vector database or to have embeddings at all. The downside is that this is not practical for any other usages.

felladrin · 2024-05-20T09:04:04Z

For a small embedding model good for this case, I can recommend this one:
sentence-transformers/multi-qa-MiniLM-L6-cos-v1 (GGUF)

flatsiedatsie · 2024-05-21T12:09:44Z

Getting there...

Currently using Transformers.js because I could find easy to copy examples:

extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
			quantized: false,
			progress_callback: data => {
				self.postMessage({
					type: 'embedding_progress',
					data
				});
			}

        });

		embeddings =  await extractor(texts, { pooling: 'mean', normalize: true });

I've alse seen mention of this model for embedding: nomic-ai/nomic-embed-text-v1? But for now.. it works.

Next: get an LLM to summarize the chunks.

ngxson · 2024-05-21T12:16:31Z

Ah nice. I tried nomic-embed-text before but it doesn't work very well. But maybe because I used Albert Einstein wiki page as the example, which is a very hard one.

Maybe you can give it a try?

Some questions that I tried but no success:

Does he play guitar?
Does he have a child?
How many wives does he have?

flatsiedatsie · 2024-05-27T07:09:14Z

Some questions that I tried but no success:
Does he play guitar?

Did you let the LLM re-formulate the prompt first? In my project I just added the step to do that by looking at the conversation history first and rewriting the user's prompt to be explicit. So "he" becomes "Albert Einstein. It seems to work.

In fact it's all now working. Although the answer in this case seems almost too good to be solely based on the retrieved chunks..

ngxson pinned this issue May 21, 2024

flatsiedatsie closed this as completed May 27, 2024

flatsiedatsie mentioned this issue May 31, 2024

Model request: add support for a small RAG model mlc-ai/web-llm#445

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How would you implement RAG / Document chat? #36

How would you implement RAG / Document chat? #36

flatsiedatsie commented May 14, 2024 •

edited

Loading

ngxson commented May 16, 2024

felladrin commented May 20, 2024

flatsiedatsie commented May 21, 2024 •

edited

Loading

ngxson commented May 21, 2024

flatsiedatsie commented May 27, 2024

How would you implement RAG / Document chat? #36

How would you implement RAG / Document chat? #36

Comments

flatsiedatsie commented May 14, 2024 • edited Loading

ngxson commented May 16, 2024

felladrin commented May 20, 2024

flatsiedatsie commented May 21, 2024 • edited Loading

ngxson commented May 21, 2024

flatsiedatsie commented May 27, 2024

flatsiedatsie commented May 14, 2024 •

edited

Loading

flatsiedatsie commented May 21, 2024 •

edited

Loading