Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding a simple implementation of ColBERT #144

Open
jjmachan opened this issue Jun 3, 2023 · 6 comments
Open

adding a simple implementation of ColBERT #144

jjmachan opened this issue Jun 3, 2023 · 6 comments

Comments

@jjmachan
Copy link

jjmachan commented Jun 3, 2023

Firstly thank you for putting together this awesome repo 🙌🏽. I think I speak for every user here, you guys have made benchmarking of IR so much easier that even folks new to the field can get started fast.

I was playing with a bunch of benchmarks and wanted to run a ColBERT benchmark and found https://github.com/thakur-nandan/beir-ColBERT extremely useful. But this is a bit harder to setup and get running unlike the other models available via beir (I'm spoiled at this point...)

I was wondering if a simpler implementation like the one I found here https://github.com/sebastian-hofstaetter/neural-ranking-kd/blob/main/minimal_colbert_usage_example.ipynb to be much more beginner friendly and would able to run experiments faster.

I'd love to contribute to this myself since I'm playing with both implementations but before that, I wanted to know if this was something that would be useful

thanks again 🍻

@zt991211
Copy link

Traceback (most recent call last):
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 58, in
main()
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 25, in main
args = parser.parse()
File "/home/zhangtong/beir-ColBERT/colbert/utils/parser.py", line 110, in parse
Run.init(args.rank, args.root, args.experiment, args.run)
File "/home/zhangtong/beir-ColBERT/colbert/utils/runs.py", line 51, in init
distributed.barrier(rank)
File "/home/zhangtong/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier
torch.distributed.barrier()
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8
Traceback (most recent call last):
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 58, in
main()
File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 25, in main
args = parser.parse()
File "/home/zhangtong/beir-ColBERT/colbert/utils/parser.py", line 110, in parse
Run.init(args.rank, args.root, args.experiment, args.run)
File "/home/zhangtong/beir-ColBERT/colbert/utils/runs.py", line 51, in init
distributed.barrier(rank)
File "/home/zhangtong/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier
torch.distributed.barrier()
File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8

Do you encounter problems like this when you reproduce the project https://github.com/thakur-nandan/beir-ColBERT?

@jjmachan
Copy link
Author

hey @zt991211 yeah I couldn't get it working as well because of some issues

@zhiyuanpeng
Copy link

Hi @jjmachan

I also find the original Colbert takes work to run. I would appreciate it if you could contribute to an easy-to-run version of Colbert. Thanks.

@thakur-nandan
Copy link
Member

My patch of ColBERT here (https://github.com/thakur-nandan/beir-ColBERT) was an unofficial copy that I used to reproduce my experiments with ColBERT v1 model. Running ColBERT v1 requires a faiss GPU installation which is different from the faiss CPU installation. Make sure you use the conda faiss-gpu build (https://anaconda.org/conda-forge/faiss-gpu) and not the PyPI build of faiss.

I would be happy if anyone above can take the initiative to provide an easy-to-run ColBERT example. This will be useful for others to quickly play with ColBERT.

The original ColBERT authors have switched to the V2 version and have some jupyter notebooks for Quickstart. Maybe you can look into working with V2 version, if ColBERT V1 looks hard to debug and play around.

Thanks,
Nandan

@Hannibal046
Copy link

Hi, I write a simple version of ColBERT: https://github.com/Hannibal046/nanoColBERT, including training, indexing and end-2-end retrieval.

@thakur-nandan
Copy link
Member

@Hannibal046 the nanoColBERT repo looks amazing and I'm sure it will be very useful for others to evaluate Colbert easily via BEIR. Could we add/patch a PR for the same?

Thanks,
Nandan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants