adding a simple implementation of ColBERT #144

jjmachan · 2023-06-03T19:18:34Z

Firstly thank you for putting together this awesome repo 🙌🏽. I think I speak for every user here, you guys have made benchmarking of IR so much easier that even folks new to the field can get started fast.

I was playing with a bunch of benchmarks and wanted to run a ColBERT benchmark and found https://github.com/thakur-nandan/beir-ColBERT extremely useful. But this is a bit harder to setup and get running unlike the other models available via beir (I'm spoiled at this point...)

I was wondering if a simpler implementation like the one I found here https://github.com/sebastian-hofstaetter/neural-ranking-kd/blob/main/minimal_colbert_usage_example.ipynb to be much more beginner friendly and would able to run experiments faster.

I'd love to contribute to this myself since I'm playing with both implementations but before that, I wanted to know if this was something that would be useful

thanks again 🍻

zt991211 · 2023-06-11T05:06:38Z

Traceback (most recent call last):
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 58, in
main()
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 25, in main
args = parser.parse()
File "/home/zhangtong/beir-ColBERT/colbert/utils/parser.py", line 110, in parse
Run.init(args.rank, args.root, args.experiment, args.run)
File "/home/zhangtong/beir-ColBERT/colbert/utils/runs.py", line 51, in init
distributed.barrier(rank)
File "/home/zhangtong/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier
torch.distributed.barrier()
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8
Traceback (most recent call last):
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 58, in
main()
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 25, in main
args = parser.parse()
File "/home/zhangtong/beir-ColBERT/colbert/utils/parser.py", line 110, in parse
Run.init(args.rank, args.root, args.experiment, args.run)
File "/home/zhangtong/beir-ColBERT/colbert/utils/runs.py", line 51, in init
distributed.barrier(rank)
File "/home/zhangtong/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier
torch.distributed.barrier()
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8

Do you encounter problems like this when you reproduce the project https://github.com/thakur-nandan/beir-ColBERT?

jjmachan · 2023-06-12T07:35:00Z

hey @zt991211 yeah I couldn't get it working as well because of some issues

zhiyuanpeng · 2023-06-16T05:03:25Z

Hi @jjmachan

I also find the original Colbert takes work to run. I would appreciate it if you could contribute to an easy-to-run version of Colbert. Thanks.

thakur-nandan · 2023-06-20T12:33:10Z

My patch of ColBERT here (https://github.com/thakur-nandan/beir-ColBERT) was an unofficial copy that I used to reproduce my experiments with ColBERT v1 model. Running ColBERT v1 requires a faiss GPU installation which is different from the faiss CPU installation. Make sure you use the conda faiss-gpu build (https://anaconda.org/conda-forge/faiss-gpu) and not the PyPI build of faiss.

I would be happy if anyone above can take the initiative to provide an easy-to-run ColBERT example. This will be useful for others to quickly play with ColBERT.

The original ColBERT authors have switched to the V2 version and have some jupyter notebooks for Quickstart. Maybe you can look into working with V2 version, if ColBERT V1 looks hard to debug and play around.

Thanks,
Nandan

Hannibal046 · 2024-01-03T06:50:35Z

Hi, I write a simple version of ColBERT: https://github.com/Hannibal046/nanoColBERT, including training, indexing and end-2-end retrieval.

thakur-nandan · 2024-02-23T14:32:48Z

@Hannibal046 the nanoColBERT repo looks amazing and I'm sure it will be very useful for others to evaluate Colbert easily via BEIR. Could we add/patch a PR for the same?

Thanks,
Nandan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding a simple implementation of ColBERT #144

adding a simple implementation of ColBERT #144

jjmachan commented Jun 3, 2023

zt991211 commented Jun 11, 2023

jjmachan commented Jun 12, 2023

zhiyuanpeng commented Jun 16, 2023

thakur-nandan commented Jun 20, 2023

Hannibal046 commented Jan 3, 2024

thakur-nandan commented Feb 23, 2024

adding a simple implementation of ColBERT #144

adding a simple implementation of ColBERT #144

Comments

jjmachan commented Jun 3, 2023

zt991211 commented Jun 11, 2023

jjmachan commented Jun 12, 2023

zhiyuanpeng commented Jun 16, 2023

thakur-nandan commented Jun 20, 2023

Hannibal046 commented Jan 3, 2024

thakur-nandan commented Feb 23, 2024