Develop a Retrieval-Augmented Generation (RAG) based AI system capable of answering questions about yourself.
I have my resume under the data/
folder(you can keep any number of pdf files under data/
maybe personal or someting related to work).
The objective is to create a simple RAG agent that will answer questions based on data and LLM.
Short steps are -
- load the personal data and split it into chunks(pages)
- for each chunk
- get embeddings from a LLM model
- index in a vector db
- user query via streamlit
- retrieve context i.e relevant entry from vector db given the query
- use context + query to create a prompt
- pass the prompt to a LLM model
-
Clone the repository.
-
Create a conda environment and activate it.
-
Install the required dependencies by running the following command:
pip install -r requirements.txt
-
install ollama to locally run LLM models
-
keep the data under
data/
folder -
to invoke llm models, we need to first downlaod the model and then run ollama locally -
ollama pull llama3 ollama serve
-
run the following command -
streamlit run app.py
- initialize the RAGSystem class
- load the data into a format
- load the pdfs as list of pages
- pages are further split into chunks; for better representation
- change the chunk_size, chunk_overlap according to choice
- other splitters are available by langchain here
- set a unique id for each chunk
- initialize a vectordb where will store chunks and their respective embeddings
- we use chromadb
- embeddings are from llama3 model;
- add chunks to vector db
- load the data into a format
- user inputs a query
- calls
answer_query
- retrevies context from vector db based on query
- [improvement] instead of just using whatever entries are sent by vectordb; we can use a reranker to rerank the entries wrt our query for better context
- langchain provides lot of retrievers
- create a prompt using context + query
- invoke LLM(llama3) with prompt
- format and send back response
- calls
- we can have LLM judge our response given a evaluation prompt
-
eval_prompt = """ Expected Response: {expected_response} Actual Response: {actual_response} --- On a scale of 1 to 5, with 5 being identical, how well does the actual response match the expected response? """