GitHub - donggyukimc/Inverse-cloze-task: Test code of Inverse cloze task for information retrieval

Questions and passages from SQuAD dataset are used for measuring passage retrieval performance.

Retrieval accuracy

Rank	TF-IDF	ICT	TF-IDF + ICT
1	49.24%	25.91%	57.77%
2	60.24%	37.14%	69.75%
3	66.41%	43.36%	75.21%

As mentioned in many previous works, token matching based methods like TF-IDF and BM25 are still powerful for retrieval system.
The result is promising considering ICT model used in this test can be much more improved.
Simple ensemble of TF-IDF and ICT show much more improved performance thanks to semantic alignment.

In this test, unsupervised training of ICT was only performed on the passages in the SQuAD dataset. Additional training with large unlabeled corpus will greatly boost up the performance of model.
As mentioned in the original paper, the model can be fine-tuned with annotated question-passage pairs.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md
main.py		main.py
model.py		model.py
util.py		util.py

Provide feedback