Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added aviary-paper-data as LitQA v2 test split #800

Merged
merged 3 commits into from
Jan 9, 2025
Merged

Conversation

jamesbraza
Copy link
Collaborator

See PR title

@jamesbraza jamesbraza added the enhancement New feature or request label Jan 9, 2025
@jamesbraza jamesbraza requested review from sidnarayanan and a team January 9, 2025 21:36
@jamesbraza jamesbraza self-assigned this Jan 9, 2025
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 9, 2025
litqa_v2 = load_dataset(labbench_dataset, "LitQA2")["train"].to_pandas()
litqa_v2["distractors"] = litqa_v2["distractors"].apply(list)
train_eval = load_dataset(train_eval_dataset, "LitQA2")["train"].to_pandas()
test = load_dataset(test_dataset, "LitQA2")["test"].to_pandas()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more of a question for me -- do we validate columns here? (i.e. is distractors checked for explicitly?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to https://github.com/Future-House/paper-qa/blob/v5.9.2/paperqa/agents/task.py#L450-L455, we get:

  • Confirmation that column attributes are present
  • typeguard can check the types of the values during unit testing, for our defaults

To be clear, I added this comment:

Let downstream usage in the TaskDataset's environment factories check for the
presence of other DataFrame columns

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 9, 2025
@jamesbraza jamesbraza merged commit a5d85b0 into main Jan 9, 2025
5 checks passed
@jamesbraza jamesbraza deleted the test-split branch January 9, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants