A conversational search system built in python.
Pull the repository from github, and install as a python package:
pip install -e .
If installed locally, henceforth the command py_css
is available. Otherwise, the following entrypoint shall be called:
python py_css/main.py
# OR, if installed locally:
py_css
A detailed help page will be presented using:
py_css --help
If installed as a python package, the following command is available:
py_css cli
py_css run_file --log=INFO --queries=data/queries_train.csv --output=output/train.txt
py_css eval --log=INFO --queries=data/queries_train.csv --qrels=data/qrels_train.txt
py_css kaggle --log=INFO --queries=data/queries_test.csv --output=output/kaggle-prf.csv
As outlined in the paper, four retrieval pipelines were implemented:
Can be selected by specifying the following parameters:
--method=baseline
--baseline-params=1000,1000,50
For indexing, the document collection has to be placed into the data/
folder.
Further Instructions
Position | ID | Description | Constraints |
---|---|---|---|
0 | bm25_docs |
The number of documents to be retrieved using BM25 . |
|
1 | mono_t5_docs |
The number of documents to be reranked by monoT5 after retrieval. |
bm25_docs >= mono_t5_docs |
2 | duo_t5_docs |
The number of documents to be reranked by duoT5 after monoT5 reranking. |
mono_t5_docs <= duo_t5_docs |
Can be selected by specifying the following parameters:
--method=baseline-prf
--baseline-prf-params=1000,17,26,1000,50
For indexing, the document collection has to be placed into the data/
folder.
Further Instructions
Position | ID | Description | Constraints |
---|---|---|---|
0 | bm25_docs |
The number of documents to be retrieved using BM25 . |
|
1 | rm3_fb_docs |
The number of documents to be used for RM3 query expansion. |
|
2 | rm3_fb_terms |
The number of terms to expand the query with using RM3 . |
|
3 | mono_t5_docs |
The number of documents to be reranked by monoT5 after retrieval. |
bm25_docs >= mono_t5_docs |
4 | duo_t5_docs |
The number of documents to be reranked by duoT5 after monoT5 reranking. |
mono_t5_docs <= duo_t5_docs |
Can be selected by specifying the following parameters:
--method=doc2query
--doc2query-params=1000,1000,50
For indexing, the document collection has to be placed into the data/
folder.
Additionally, descriptive queries for each document have to be generated using this script.
Further Instructions
Position | ID | Description | Constraints |
---|---|---|---|
0 | bm25_docs |
The number of documents to be retrieved using BM25 . |
|
1 | mono_t5_docs |
The number of documents to be reranked by monoT5 after retrieval. |
bm25_docs >= mono_t5_docs |
2 | duo_t5_docs |
The number of documents to be reranked by duoT5 after monoT5 reranking. |
mono_t5_docs <= duo_t5_docs |
Can be selected by specifying the following parameters:
--method=doc2query-prf
--doc2query-prf-params=1000,17,26,1000,50
For indexing, the document collection has to be placed into the data/
folder.
Additionally, descriptive queries for each document have to be generated using this script.
Further Instructions
Position | ID | Description | Constraints |
---|---|---|---|
0 | bm25_docs |
The number of documents to be retrieved using BM25 . |
|
1 | rm3_fb_docs |
The number of documents to be used for RM3 query expansion. |
|
2 | rm3_fb_terms |
The number of terms to expand the query with using RM3 . |
|
3 | mono_t5_docs |
The number of documents to be reranked by monoT5 after retrieval. |
bm25_docs >= mono_t5_docs |
4 | duo_t5_docs |
The number of documents to be reranked by duoT5 after monoT5 reranking. |
mono_t5_docs <= duo_t5_docs |