Source of nq_labeled_output.tsv #70

WJ44 · 2024-08-22T15:18:24Z

Hello,

The nq_labeled_output.tsv file contains 6189 rows from NQ. As I understand this file was used to generate the synthetic data to train the LLM Judges. Where was this file sourced? Specifically, which split from NQ is this from and how where the rows selected. I am trying to understand how one would apply ARES to a new dataset (and evaluate it) i.e. creating synthetic data, training the judges and evaluating on mock datasets. I assume the documents used in the human reference set and the mock splits should be separate from the ones used to generate the synthetic data. I was also wondering how the provided synthetic data was generated. I see the file for NQ has precisely 3000 rows. Were all 6189 documents used like in the examples and was the set shrunk later?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source of nq_labeled_output.tsv #70

Source of nq_labeled_output.tsv #70

WJ44 commented Aug 22, 2024

Source of nq_labeled_output.tsv #70

Source of nq_labeled_output.tsv #70

Comments

WJ44 commented Aug 22, 2024