Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How would you implement RAG / Document chat? #36

Closed
flatsiedatsie opened this issue May 14, 2024 · 5 comments
Closed

How would you implement RAG / Document chat? #36

flatsiedatsie opened this issue May 14, 2024 · 5 comments

Comments

@flatsiedatsie
Copy link
Contributor

flatsiedatsie commented May 14, 2024

In your readme you mention:

Maybe doing a full RAG-in-browser example using tinyllama?

I've been looking into a way to allow users to 'chat with their documents'. A popular concept. Specifically I was looking into 'Fully local PDF chatbot'. It seems.. complicated.

So I was wondering: if one wanted to implement this feature using Wllama, what are the 'components' of such a solution?

Would it be something like...

  • Wllama's embedding feature turns text chunks into vector objects?
  • Those could then be stored in Voy?
  • magic
  • magic
  • The user gets an answer to their question, e.g. "The sun is 2948520 degrees, which I found on page 16"?

What would the steps actually be?

@ngxson
Copy link
Owner

ngxson commented May 16, 2024

A classic RAG system consist of a vector database + a generative model. With wllama, this can be archived by:

  • Embedding model. However, I still couldn't find a good embedding model.
  • For database, we can use Voy as you mentioned, or HNWS which is a pure JS implementation (we don't need to much performance on this part, our database is relatively small anyway)
  • A good generative model that does not hallucinate. This is very important and requires using specific model, for example Llama3-ChatQA-1.5-8B by nvidia. These models are generally "dumb" or "stupid", but will be safe because they don't make up information if it is not found in our RAG

Another idea that is only possible if your document is short and predefined, is to construct a session and reuse it later (via sessionSave and sessionLoad) - This is useful in my case for example, if the chatbot is purely to introduce a specific website, we don't even need to make a vector database or to have embeddings at all. The downside is that this is not practical for any other usages.

@felladrin
Copy link
Contributor

For a small embedding model good for this case, I can recommend this one:
sentence-transformers/multi-qa-MiniLM-L6-cos-v1 (GGUF)

@flatsiedatsie
Copy link
Contributor Author

flatsiedatsie commented May 21, 2024

Getting there...

Screenshot 2024-05-21 at 14 01 51

Currently using Transformers.js because I could find easy to copy examples:

extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
			quantized: false,
			progress_callback: data => {
				self.postMessage({
					type: 'embedding_progress',
					data
				});
			}

        });

		embeddings =  await extractor(texts, { pooling: 'mean', normalize: true });

I've alse seen mention of this model for embedding: nomic-ai/nomic-embed-text-v1? But for now.. it works.

Next: get an LLM to summarize the chunks.

@ngxson
Copy link
Owner

ngxson commented May 21, 2024

Ah nice. I tried nomic-embed-text before but it doesn't work very well. But maybe because I used Albert Einstein wiki page as the example, which is a very hard one.

Maybe you can give it a try?

Some questions that I tried but no success:

  • Does he play guitar?
  • Does he have a child?
  • How many wives does he have?

@ngxson ngxson pinned this issue May 21, 2024
@flatsiedatsie
Copy link
Contributor Author

Some questions that I tried but no success:
Does he play guitar?

Did you let the LLM re-formulate the prompt first? In my project I just added the step to do that by looking at the conversation history first and rewriting the user's prompt to be explicit. So "he" becomes "Albert Einstein. It seems to work.

In fact it's all now working. Although the answer in this case seems almost too good to be solely based on the retrieved chunks..

Screenshot 2024-05-27 at 08 40 37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants