Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

G1数据预处理后qrels.train文件长度与下载到的数据集的qrel.train文件长度不一致 #332

Open
Huangsz2021 opened this issue Dec 27, 2024 · 0 comments

Comments

@Huangsz2021
Copy link

在训练G1的retriever的时候发现,如果我直接用下载的数据集训练与在本地跑数据preprocess后的数据集训练的step数不同,后发现preprocess后的qrel.train文件行数变为198013,而直接下载的qrel.train文件行数为432394,导致训练retreiver无法复现论文中的分数,请问要怎么调整preprocess_retriever_data代码来得到正确的训练数据?
@Kunlun-Zhu @yhyu13 @tangxiangru @thuqinyj16 @lilbillybiscuit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant