We offer a suite of rerankers - pointwise models like monoT5 and listwise models with a focus on open source LLMs compatible with FastChat (e.g., Vicuna, Zephyr, etc.), vLLM, SGLang, or TensorRT-LLM. We also support RankGPT variants, which are proprietary listwise rerankers. Addtionally, we support reranking with the first-token logits only to improve inference efficiency. Some of the code in this repository is borrowed from RankGPT, PyGaggle, and LiT5!
current_version = 0.20.3
Note for Mac Users: RankLLM is not compatible with Apple Silicon (M1/M2) chips. However, you can still run it by using the Intel-based version of Anaconda and launching your terminal through Rosetta 2.
conda create -n rankllm python=3.10
conda activate rankllm
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip3 install torch torchvision torchaudio
conda install -c conda-forge openjdk=21 maven -y
pip install -r requirements.txt
pip install -e .[vllm] # local installation for development
pip install rank-llm[vllm] # or pip installation
pip install -e .[sglang] # local installation for development
pip install rank-llm[sglang] # or pip installation
Remember to install flashinfer to use SGLang
backend.
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
pip install -e .[tensorrt-llm] # local installation for development
pip install rank-llm[tensorrt-llm] # or pip installation
We can run the RankZephyr model with the following command:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 \
--retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT --context_size=4096 --variable_passages
Including the --vllm_batched
flag will allow you to run the model in batched mode using the vLLM
library.
Including the --sglang_batched
flag will allow you to run the model in batched mode using the SGLang
library.
Including the --tensorrt_batched
flag will allow you to run the model in batched mode using the TensorRT-LLM
library.
If you want to run multiple passes of the model, you can use the --num_passes
flag.
We can run the RankGPT4-o model with the following command:
python src/rank_llm/scripts/run_rank_llm.py --model_path=gpt-4o --top_k_candidates=100 --dataset=dl20 \
--retrieval_method=bm25 --prompt_mode=rank_GPT_APEER --context_size=4096 --use_azure_openai
Note that the --prompt_mode
is set to rank_GPT_APEER
to use the LLM refined prompt from APEER.
This can be changed to rank_GPT
to use the original prompt.
We can run the LiT5-Distill V2 model (which could rerank 100 documents in a single pass) with the following command:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/LiT5-Distill-large-v2 --top_k_candidates=100 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=LiT5 --context_size=150 --vllm_batched --batch_size=4 \
--variable_passages --window_size=100
We can run the LiT5-Distill original model (which works with a window size of 20) with the following command:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/LiT5-Distill-large --top_k_candidates=100 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=LiT5 --context_size=150 --vllm_batched --batch_size=32 \
--variable_passages
We can run the LiT5-Score model with the following command:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/LiT5-Score-large --top_k_candidates=100 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=LiT5 --context_size=150 --vllm_batched --batch_size=8 \
--window_size=100 --variable_passages
The following runs the 3B variant of monoT5 trained for 10K steps:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/monot5-3b-msmarco-10k --top_k_candidates=1000 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=monot5 --context_size=512
Note that we usually rerank 1K candidates with monoT5.
We can run the FirstMistral model, reranking using the first-token logits only with the following command:
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/first_mistral --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT --context_size=4096 --variable_passages --use_logits --use_alpha --vllm_batched --num_gpus 1
Omit --use_logits
if you wish to perform traditional listwise reranking.
If you would like to contribute to the project, please refer to the contribution guidelines.
The following is a table of the listwise models our repository was primarily built to handle (with the models hosted on HuggingFace):
vLLM
, SGLang
, and TensorRT-LLM
backends are only supported for RankZephyr
and RankVicuna
models.
Model Name | Hugging Face Identifier/Link |
---|---|
RankZephyr 7B V1 - Full - BF16 | castorini/rank_zephyr_7b_v1_full |
RankVicuna 7B - V1 | castorini/rank_vicuna_7b_v1 |
RankVicuna 7B - V1 - No Data Augmentation | castorini/rank_vicuna_7b_v1_noda |
RankVicuna 7B - V1 - FP16 | castorini/rank_vicuna_7b_v1_fp16 |
RankVicuna 7B - V1 - No Data Augmentation - FP16 | castorini/rank_vicuna_7b_v1_noda_fp16 |
We also officially support the following rerankers built by our group:
The following is a table specifically for our LiT5 suite of models hosted on HuggingFace:
Model Name | Hugging Face Identifier/Link |
---|---|
LiT5 Distill base | castorini/LiT5-Distill-base |
LiT5 Distill large | castorini/LiT5-Distill-large |
LiT5 Distill xl | castorini/LiT5-Distill-xl |
LiT5 Distill base v2 | castorini/LiT5-Distill-base-v2 |
LiT5 Distill large v2 | castorini/LiT5-Distill-large-v2 |
LiT5 Distill xl v2 | castorini/LiT5-Distill-xl-v2 |
LiT5 Score base | castorini/LiT5-Score-base |
LiT5 Score large | castorini/LiT5-Score-large |
LiT5 Score xl | castorini/LiT5-Score-xl |
Now you can run top-100 reranking with the v2 model in a single pass while maintaining efficiency!
The following is a table specifically for our monoT5 suite of models hosted on HuggingFace:
Model Name | Hugging Face Identifier/Link |
---|---|
monoT5 Small MSMARCO 10K | castorini/monot5-small-msmarco-10k |
monoT5 Small MSMARCO 100K | castorini/monot5-small-msmarco-100k |
monoT5 Base MSMARCO | castorini/monot5-base-msmarco |
monoT5 Base MSMARCO 10K | castorini/monot5-base-msmarco-10k |
monoT5 Large MSMARCO 10K | castorini/monot5-large-msmarco-10k |
monoT5 Large MSMARCO | castorini/monot5-large-msmarco |
monoT5 3B MSMARCO 10K | castorini/monot5-3b-msmarco-10k |
monoT5 3B MSMARCO | castorini/monot5-3b-msmarco |
monoT5 Base Med MSMARCO | castorini/monot5-base-med-msmarco |
monoT5 3B Med MSMARCO | castorini/monot5-3b-med-msmarco |
We recommend the Med models for biomedical retrieval. We also provide both 10K (generally better OOD effectiveness) and 100K checkpoints (better in-domain).
If you use RankLLM, please cite the following relevant papers:
@ARTICLE{pradeep2023rankvicuna,
title = {{RankVicuna}: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models},
author = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
year = {2023},
journal = {arXiv:2309.15088}
}
[2312.02724] RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
@ARTICLE{pradeep2023rankzephyr,
title = {{RankZephyr}: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!},
author = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
year = {2023},
journal = {arXiv:2312.02724}
}
If you use one of the LiT5 models please cite the following relevant paper:
@ARTICLE{tamber2023scaling,
title = {Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models},
author = {Manveer Singh Tamber and Ronak Pradeep and Jimmy Lin},
year = {2023},
journal = {arXiv:2312.16098}
}
If you use one of the monoT5 models please cite the following relevant paper:
@ARTICLE{pradeep2021emd,
title = {The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models},
author = {Ronak Pradeep and Rodrigo Nogueira and Jimmy Lin},
year = {2021},
journal = {arXiv:2101.05667},
}
If you use the FirstMistral model, please consider citing:
@ARTICLE{chen2024firstrepro,
title = title={An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking},
author = {Zijian Chen and Ronak Pradeep and Jimmy Lin},
year = {2024},
journal = {arXiv:2411.05508}
}
If you would like to cite the FIRST methodology, please consider citing:
[2406.15657] FIRST: Faster Improved Listwise Reranking with Single Token Decoding
@ARTICLE{reddy2024first,
title = {FIRST: Faster Improved Listwise Reranking with Single Token Decoding},
author = {Reddy, Revanth Gangi and Doo, JaeHyeok and Xu, Yifei and Sultan, Md Arafat and Swain, Deevya and Sil, Avirup and Ji, Heng},
year = {2024}
journal = {arXiv:2406.15657},
}
This research is supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada.