Skip to content

Commit

Permalink
feat: add reranking (#20)
Browse files Browse the repository at this point in the history
  • Loading branch information
lsorber authored Oct 7, 2024
1 parent 81de1ff commit 0b4b919
Show file tree
Hide file tree
Showing 17 changed files with 471 additions and 148 deletions.
4 changes: 2 additions & 2 deletions .cruft.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"template": "https://github.com/superlinear-ai/poetry-cookiecutter",
"commit": "a969f1d182ec39d7d27ccb1116cf60ba736adcfa",
"commit": "b7f2fb0f123aae0a01d2ab015db31f52d2d8cc21",
"checkout": null,
"context": {
"cookiecutter": {
Expand All @@ -26,4 +26,4 @@
}
},
"directory": null
}
}
6 changes: 4 additions & 2 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@
100
],
"files.autoSave": "onFocusChange",
"jupyter.kernels.excludePythonEnvironments": ["/usr/local/bin/python"],
"jupyter.kernels.excludePythonEnvironments": [
"/usr/local/bin/python"
],
"mypy-type-checker.importStrategy": "fromEnvironment",
"mypy-type-checker.preferDaemon": true,
"notebook.codeActionsOnSave": {
Expand All @@ -50,7 +52,7 @@
"python.terminal.activateEnvironment": false,
"python.testing.pytestEnabled": true,
"ruff.importStrategy": "fromEnvironment",
"ruff.logLevel": "warn",
"ruff.logLevel": "warning",
"terminal.integrated.defaultProfile.linux": "zsh",
"terminal.integrated.profiles.linux": {
"zsh": {
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ data/
# dotenv
.env

# Rerankers
.*_cache/

# Hypothesis
.hypothesis/

Expand Down
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ RUN --mount=type=cache,target=/var/cache/apt/ \
sh -c "$(curl -fsSL https://starship.rs/install.sh)" -- "--yes" && \
usermod --shell /usr/bin/zsh user && \
echo 'user ALL=(root) NOPASSWD:ALL' > /etc/sudoers.d/user && chmod 0440 /etc/sudoers.d/user
RUN git config --system --add safe.directory '*'
USER user

# Install the development Python dependencies in the virtual environment.
Expand Down
89 changes: 69 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,36 @@
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/superlinear-ai/raglite) [![Open in GitHub Codespaces](https://img.shields.io/static/v1?label=GitHub%20Codespaces&message=Open&color=blue&logo=github)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=812973394&skip_quickstart=true)
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/superlinear-ai/raglite) [![Open in GitHub Codespaces](https://img.shields.io/static/v1?label=GitHub%20Codespaces&message=Open&color=blue&logo=github)](https://github.com/codespaces/new/superlinear-ai/raglite)

# 🥤 RAGLite

RAGLite is a Python package for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite.

## Features

1. ❤️ Only lightweight and permissive open source dependencies (e.g., no [PyTorch](https://github.com/pytorch/pytorch), [LangChain](https://github.com/langchain-ai/langchain), or [PyMuPDF](https://github.com/pymupdf/PyMuPDF))
2. 🧠 Choose any LLM provider with [LiteLLM](https://github.com/BerriAI/litellm), including local [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) models
3. 💾 Either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
4. 🚀 Acceleration with Metal on macOS, and CUDA on Linux and Windows
5. 📖 PDF to Markdown conversion on top of [pdftext](https://github.com/VikParuchuri/pdftext) and [pypdfium2](https://github.com/pypdfium2-team/pypdfium2)
6. 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
7. ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
8. 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)
9. 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) that combines the database's built-in keyword search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html) in PostgreSQL, [FTS5](https://www.sqlite.org/fts5.html) in SQLite) with their native vector search extensions ([pgvector](https://github.com/pgvector/pgvector) in PostgreSQL, [sqlite-vec](https://github.com/asg017/sqlite-vec) in SQLite)
10. ✍️ Optional: conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
11. ✅ Optional: evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)
##### Configurable

- 🧠 Choose any LLM provider with [LiteLLM](https://github.com/BerriAI/litellm), including local [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) models
- 💾 Choose either [PostgreSQL](https://github.com/postgres/postgres) or [SQLite](https://github.com/sqlite/sqlite) as a keyword & vector search database
- 🥇 Choose any reranker with [rerankers](https://github.com/AnswerDotAI/rerankers), including multi-lingual [FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) as the default

##### Fast and permissive

- ❤️ Only lightweight and permissive open source dependencies (e.g., no [PyTorch](https://github.com/pytorch/pytorch) or [LangChain](https://github.com/langchain-ai/langchain))
- 🚀 Acceleration with Metal on macOS, and CUDA on Linux and Windows

##### Unhobbled

- 📖 PDF to Markdown conversion on top of [pdftext](https://github.com/VikParuchuri/pdftext) and [pypdfium2](https://github.com/pypdfium2-team/pypdfium2)
- 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
- ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) with the database's native keyword & vector search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html)+[pgvector](https://github.com/pgvector/pgvector), [FTS5](https://www.sqlite.org/fts5.html)+[sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
- 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)

##### Extensible

- ✍️ Optional conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
- ✅ Optional evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)

[^1]: We use [PyNNDescent](https://github.com/lmcinnes/pynndescent) until [sqlite-vec](https://github.com/asg017/sqlite-vec) is more mature.

## Installing

Expand Down Expand Up @@ -57,10 +71,10 @@ pip install raglite[ragas]
### 1. Configuring RAGLite

> [!TIP]
> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/collections/bartowski/recent-highlights-65cf8e08f8ab7fc669d7b5bd)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
> 🧠 RAGLite extends [LiteLLM](https://github.com/BerriAI/litellm) with support for [llama.cpp](https://github.com/ggerganov/llama.cpp) models using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). To select a llama.cpp model (e.g., from [bartowski's collection](https://huggingface.co/bartowski)), use a model identifier of the form `"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"`, where `n_ctx` is an optional parameter that specifies the context size of the model.
> [!TIP]
> 💾 You can create a PostgreSQL database for free in a few clicks at [neon.tech](https://neon.tech) (not sponsored).
> 💾 You can create a PostgreSQL database in a few clicks at [neon.tech](https://neon.tech).
First, configure RAGLite with your preferred PostgreSQL or SQLite database and [any LLM supported by LiteLLM](https://docs.litellm.ai/docs/providers/openai):

Expand All @@ -82,6 +96,27 @@ my_config = RAGLiteConfig(
)
```

You can also configure [any reranker supported by rerankers](https://github.com/AnswerDotAI/rerankers):

```python
from rerankers import Reranker

# Example remote API-based reranker:
my_config = RAGLiteConfig(
db_url="postgresql://my_username:my_password@my_host:5432/my_database"
reranker=Reranker("cohere", lang="en", api_key=COHERE_API_KEY)
)

# Example local cross-encoder reranker per language (this is the default):
my_config = RAGLiteConfig(
db_url="sqlite:///raglite.sqlite",
reranker=(
("en", Reranker("ms-marco-MiniLM-L-12-v2", model_type="flashrank")), # English
("other", Reranker("ms-marco-MultiBERT-L-12", model_type="flashrank")), # Other languages
)
)
```

### 2. Inserting documents

> [!TIP]
Expand All @@ -100,24 +135,38 @@ insert_document(Path("Special Relativity.pdf"), config=my_config)

### 3. Searching and Retrieval-Augmented Generation (RAG)

Now, you can search for chunks with keyword search, vector search, or a hybrid of the two. You can also answer questions with RAG and the search method of your choice (`hybrid` is the default):
Now, you can search for chunks with vector search, keyword search, or a hybrid of the two. You can also rerank the search results with the configured reranker. And you can use any search method of your choice (`hybrid_search` is the default) together with reranking to answer questions with RAG:

```python
# Search for chunks:
from raglite import hybrid_search, keyword_search, vector_search

prompt = "How is intelligence measured?"
results_vector = vector_search(prompt, num_results=5, config=my_config)
results_keyword = keyword_search(prompt, num_results=5, config=my_config)
results_hybrid = hybrid_search(prompt, num_results=5, config=my_config)
chunk_ids_vector, _ = vector_search(prompt, num_results=20, config=my_config)
chunk_ids_keyword, _ = keyword_search(prompt, num_results=20, config=my_config)
chunk_ids_hybrid, _ = hybrid_search(prompt, num_results=20, config=my_config)

# Retrieve chunks:
from raglite import retrieve_chunks

chunks_hybrid = retrieve_chunks(chunk_ids_hybrid, config=my_config)

# Rerank chunks:
from raglite import rerank

chunks_reranked = rerank(prompt, chunks_hybrid, config=my_config)

# Answer questions with RAG:
from raglite import rag

prompt = "What does it mean for two events to be simultaneous?"
stream = rag(prompt, search=hybrid_search, config=my_config)
stream = rag(prompt, config=my_config)
for update in stream:
print(update, end="")

# You can also pass a search method or search results directly:
stream = rag(prompt, search=hybrid_search, config=my_config)
stream = rag(prompt, search=chunks_reranked, config=my_config)
```

### 4. Computing and using an optimal query adapter
Expand All @@ -129,7 +178,7 @@ RAGLite can compute and apply an [optimal closed-form query adapter](src/raglite
from raglite import insert_evals, update_query_adapter

insert_evals(num_evals=100, config=my_config)
update_query_adapter(config=my_config)
update_query_adapter(config=my_config) # From here, simply call vector_search to use the query adapter.
```

### 5. Evaluation of retrieval and generation
Expand Down
Loading

0 comments on commit 0b4b919

Please sign in to comment.