This guide provides instructions to reproduce the SBERT dense retrieval models for MS MARCO passage ranking (v3) described here.
Note that we often observe minor differences in scores between different computing environments (e.g., Linux vs. macOS). However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective. Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.
Dense retrieval, brute-force index:
python -m pyserini.search.faiss \
--index msmarco-v1-passage.sbert \
--topics msmarco-passage-dev-subset \
--encoded-queries sbert-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.sbert.bf.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
Replace --encoded-queries
by --encoder sentence-transformers/msmarco-distilbert-base-v3
for on-the-fly query encoding.
To evaluate:
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.sbert.bf.tsv
#####################
MRR @10: 0.3314
QueriesRanked: 6980
#####################
We can also use the official TREC evaluation tool trec_eval
to compute other metrics than MRR@10.
For that we first need to convert runs and qrels files to the TREC format:
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.sbert.bf.tsv \
--output runs/run.msmarco-passage.sbert.bf.trec
$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.sbert.bf.trec
map all 0.3373
recall_1000 all 0.9558
Hybrid retrieval with dense-sparse representations (without document expansion):
- dense retrieval with SBERT, brute force index.
- sparse retrieval with BM25
msmarco-passage
(i.e., default bag-of-words) index.
python -m pyserini.search.hybrid \
dense --index msmarco-v1-passage.sbert \
--encoded-queries sbert-msmarco-passage-dev-subset \
sparse --index msmarco-v1-passage \
fusion --alpha 0.015 \
run --topics msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.sbert.bf.bm25.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
Replace --encoded-queries
by --encoder sentence-transformers/msmarco-distilbert-base-v3
for on-the-fly query encoding.
To evaluate:
$ python -m pyserini.eval.msmarco_passage_eval \
msmarco-passage-dev-subset runs/run.msmarco-passage.sbert.bf.bm25.tsv
#####################
MRR @10: 0.3380
QueriesRanked: 6980
#####################
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.sbert.bf.bm25.tsv \
--output runs/run.msmarco-passage.sbert.bf.bm25.trec
$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.sbert.bf.bm25.trec
map all 0.3446
recall_1000 all 0.9659