A simple Gradio interface for semantic search across multiple PDF documents using a combination of BM25 and vector embeddings to find relevant documents. The script builds a FAISS index on corpus of the uploaded documents, and first uses BM25 to find the top relevant results, then reranks them using cosine similarity to the search query.
python3 -m venv venv
UNIX/MacOS:
source venv/bin/activate
Windows:
venv/Scripts/activate
If this is your first time running this or the package dependencies have changed, run this command to install all dependencies.
pip install -r requirements.txt
Run the app in reload mode with this command. This will let the app reload automatically when changes are made to the python script.
python main.py