feat: add reranking (#20)

superlinear-ai · Oct 7, 2024 · 0b4b919 · 0b4b919
1 parent 81de1ff
commit 0b4b919
Show file tree

Hide file tree

Showing 17 changed files with 471 additions and 148 deletions.
diff --git a/.cruft.json b/.cruft.json
@@ -1,6 +1,6 @@
 {
   "template": "https://github.com/superlinear-ai/poetry-cookiecutter",
-  "commit": "a969f1d182ec39d7d27ccb1116cf60ba736adcfa",
+  "commit": "b7f2fb0f123aae0a01d2ab015db31f52d2d8cc21",
   "checkout": null,
   "context": {
     "cookiecutter": {
@@ -26,4 +26,4 @@
     }
   },
   "directory": null
-}
+}
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -38,7 +38,9 @@
                     100
                 ],
                 "files.autoSave": "onFocusChange",
-                "jupyter.kernels.excludePythonEnvironments": ["/usr/local/bin/python"],
+                "jupyter.kernels.excludePythonEnvironments": [
+                    "/usr/local/bin/python"
+                ],
                 "mypy-type-checker.importStrategy": "fromEnvironment",
                 "mypy-type-checker.preferDaemon": true,
                 "notebook.codeActionsOnSave": {
@@ -50,7 +52,7 @@
                 "python.terminal.activateEnvironment": false,
                 "python.testing.pytestEnabled": true,
                 "ruff.importStrategy": "fromEnvironment",
-                "ruff.logLevel": "warn",
+                "ruff.logLevel": "warning",
                 "terminal.integrated.defaultProfile.linux": "zsh",
                 "terminal.integrated.profiles.linux": {
                     "zsh": {

diff --git a/.gitignore b/.gitignore
@@ -19,6 +19,9 @@ data/
 # dotenv
 .env
 
+# Rerankers
+.*_cache/
+
 # Hypothesis
 .hypothesis/
 

diff --git a/Dockerfile b/Dockerfile
@@ -70,6 +70,7 @@ RUN --mount=type=cache,target=/var/cache/apt/ \
     sh -c "$(curl -fsSL https://starship.rs/install.sh)" -- "--yes" && \
     usermod --shell /usr/bin/zsh user && \
     echo 'user ALL=(root) NOPASSWD:ALL' > /etc/sudoers.d/user && chmod 0440 /etc/sudoers.d/user
+RUN git config --system --add safe.directory '*'
 USER user
 
 # Install the development Python dependencies in the virtual environment.

diff --git a/README.md b/README.md
@@ -1,22 +1,36 @@
-[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/superlinear-ai/raglite) [![Open in GitHub Codespaces](https://img.shields.io/static/v1?label=GitHub%20Codespaces&message=Open&color=blue&logo=github)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=812973394&skip_quickstart=true)
+[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/superlinear-ai/raglite) [![Open in GitHub Codespaces](https://img.shields.io/static/v1?label=GitHub%20Codespaces&message=Open&color=blue&logo=github)](https://github.com/codespaces/new/superlinear-ai/raglite)
 
 # 🥤 RAGLite
 
 RAGLite is a Python package for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite.
 
 ## Features
 
-1. ❤️ Only lightweight and permissive open source dependencies (e.g., no [PyTorch](https://github.com/pytorch/pytorch), [LangChain](https://github.com/langchain-ai/langchain), or [PyMuPDF](https://github.com/pymupdf/PyMuPDF))
-2. 🧠 Choose any LLM provider with [LiteLLM](https://github.com/BerriAI/litellm), including local [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) models
-3. 💾 Either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
-4. 🚀 Acceleration with Metal on macOS, and CUDA on Linux and Windows
-5. 📖 PDF to Markdown conversion on top of [pdftext](https://github.com/VikParuchuri/pdftext) and [pypdfium2](https://github.com/pypdfium2-team/pypdfium2)
-6. 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
-7. ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
-8. 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)
-9. 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) that combines the database's built-in keyword search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html) in PostgreSQL, [FTS5](https://www.sqlite.org/fts5.html) in SQLite) with their native vector search extensions ([pgvector](https://github.com/pgvector/pgvector) in PostgreSQL, [sqlite-vec](https://github.com/asg017/sqlite-vec) in SQLite)
-10. ✍️ Optional: conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
-11. ✅ Optional: evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)
+##### Configurable
+
+- 🧠 Choose any LLM provider with [LiteLLM](https://github.com/BerriAI/litellm), including local [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) models
+- 💾 Choose either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
+- 🥇 Choose any reranker with [rerankers](https://github.com/AnswerDotAI/rerankers), including multi-lingual [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) as the default
+
+##### Fast and permissive
+
+- ❤️ Only lightweight and permissive open source dependencies (e.g., no [PyTorch](https://github.com/pytorch/pytorch) or [LangChain](https://github.com/langchain-ai/langchain))
+- 🚀 Acceleration with Metal on macOS, and CUDA on Linux and Windows
+
+##### Unhobbled
+
+- 📖 PDF to Markdown conversion on top of [pdftext](https://github.com/VikParuchuri/pdftext) and [pypdfium2](https://github.com/pypdfium2-team/pypdfium2)
+- 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
+- ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
+- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) with the database's native keyword & vector search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html)+[pgvector](https://github.com/pgvector/pgvector), [FTS5](https://www.sqlite.org/fts5.html)+[sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
+- 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)
+
+##### Extensible
+
+- ✍️ Optional conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
+- ✅ Optional evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)
+
+[^1]: We use [PyNNDescent](https://github.com/lmcinnes/pynndescent) until [sqlite-vec](https://github.com/asg017/sqlite-vec) is more mature.
 
 ## Installing
 
@@ -57,10 +71,10 @@ pip install raglite[ragas]
 ### 1. Configuring RAGLite
 
 > [!TIP]
-> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/collections/bartowski/recent-highlights-65cf8e08f8ab7fc669d7b5bd)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
+> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/bartowski)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
 
 > [!TIP]
-> 💾 You can create a PostgreSQL database for free in a few clicks at [neon.tech](https://neon.tech) (not sponsored).
+> 💾 You can create a PostgreSQL database in a few clicks at [neon.tech](https://neon.tech).
 
 First, configure RAGLite with your preferred PostgreSQL or SQLite database and [any LLM supported by LiteLLM](https://docs.litellm.ai/docs/providers/openai):
 
@@ -82,6 +96,27 @@ my_config = RAGLiteConfig(
 )
 ```
 
+You can also configure [any reranker supported by rerankers](https://github.com/AnswerDotAI/rerankers):
+
+```python
+from rerankers import Reranker
+
+# Example remote API-based reranker:
+my_config = RAGLiteConfig(
+    db_url="postgresql://my_username:my_password@my_host:5432/my_database"
+    reranker=Reranker("cohere", lang="en", api_key=COHERE_API_KEY)
+)
+
+# Example local cross-encoder reranker per language (this is the default):
+my_config = RAGLiteConfig(
+    db_url="sqlite:///raglite.sqlite",
+    reranker=(
+        ("en", Reranker("ms-marco-MiniLM-L-12-v2", model_type="flashrank")),  # English
+        ("other", Reranker("ms-marco-MultiBERT-L-12", model_type="flashrank")),  # Other languages
+    )
+)
+```
+
 ### 2. Inserting documents
 
 > [!TIP]
@@ -100,24 +135,38 @@ insert_document(Path("Special Relativity.pdf"), config=my_config)
 
 ### 3. Searching and Retrieval-Augmented Generation (RAG)
 
-Now, you can search for chunks with keyword search, vector search, or a hybrid of the two. You can also answer questions with RAG and the search method of your choice (`hybrid` is the default):
+Now, you can search for chunks with vector search, keyword search, or a hybrid of the two. You can also rerank the search results with the configured reranker. And you can use any search method of your choice (`hybrid_search` is the default) together with reranking to answer questions with RAG:
 
 ```python
 # Search for chunks:
 from raglite import hybrid_search, keyword_search, vector_search
 
 prompt = "How is intelligence measured?"
-results_vector = vector_search(prompt, num_results=5, config=my_config)
-results_keyword = keyword_search(prompt, num_results=5, config=my_config)
-results_hybrid = hybrid_search(prompt, num_results=5, config=my_config)
+chunk_ids_vector, _ = vector_search(prompt, num_results=20, config=my_config)
+chunk_ids_keyword, _ = keyword_search(prompt, num_results=20, config=my_config)
+chunk_ids_hybrid, _ = hybrid_search(prompt, num_results=20, config=my_config)
+
+# Retrieve chunks:
+from raglite import retrieve_chunks
+
+chunks_hybrid = retrieve_chunks(chunk_ids_hybrid, config=my_config)
+
+# Rerank chunks:
+from raglite import rerank
+
+chunks_reranked = rerank(prompt, chunks_hybrid, config=my_config)
 
 # Answer questions with RAG:
 from raglite import rag
 
 prompt = "What does it mean for two events to be simultaneous?"
-stream = rag(prompt, search=hybrid_search, config=my_config)
+stream = rag(prompt, config=my_config)
 for update in stream:
     print(update, end="")
+
+# You can also pass a search method or search results directly:
+stream = rag(prompt, search=hybrid_search, config=my_config)
+stream = rag(prompt, search=chunks_reranked, config=my_config)
 ```
 
 ### 4. Computing and using an optimal query adapter
@@ -129,7 +178,7 @@ RAGLite can compute and apply an [optimal closed-form query adapter](src/raglite
 from raglite import insert_evals, update_query_adapter
 
 insert_evals(num_evals=100, config=my_config)
-update_query_adapter(config=my_config)
+update_query_adapter(config=my_config)  # From here, simply call vector_search to use the query adapter.
 ```
 
 ### 5. Evaluation of retrieval and generation