Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source of nq_labeled_output.tsv #70

Open
WJ44 opened this issue Aug 22, 2024 · 0 comments
Open

Source of nq_labeled_output.tsv #70

WJ44 opened this issue Aug 22, 2024 · 0 comments

Comments

@WJ44
Copy link
Contributor

WJ44 commented Aug 22, 2024

Hello,

The nq_labeled_output.tsv file contains 6189 rows from NQ. As I understand this file was used to generate the synthetic data to train the LLM Judges. Where was this file sourced? Specifically, which split from NQ is this from and how where the rows selected. I am trying to understand how one would apply ARES to a new dataset (and evaluate it) i.e. creating synthetic data, training the judges and evaluating on mock datasets. I assume the documents used in the human reference set and the mock splits should be separate from the ones used to generate the synthetic data. I was also wondering how the provided synthetic data was generated. I see the file for NQ has precisely 3000 rows. Were all 6189 documents used like in the examples and was the set shrunk later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant