Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SUMO Web Claims Summarization scenario #3112

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

yifanmai
Copy link
Collaborator

This is a summarization benchmark based on the climate subset of the SUMO dataset (Mishra et al., 2020).

@yifanmai
Copy link
Collaborator Author

Hi @ryokawajp, please review this pull request.

By approving, you acknowledge that:

  • You will be listed as a co-author of the merge commit
  • You agree to license your work under Apache 2.0

I have made the following changes have been made on top of your original commit:

  • Fixed a bug where truncation happened before filtering, which would cause long examples to not get filtered if the truncation length is less than the filter length.
  • Reorganized arguments to the scenarios to make their purposes more clear.
  • Removed the round trip to CSV.
  • Removed the use of Hugging Face datasets
  • Moved column names to class constants.
  • Moved configuration constants to args in get_sumosum_spec().
  • Fixed _clean_and_truncate() to replace all sequences of whitespace.
  • Added type annotations.
  • Removed unnecessary extra call to _clean_and_truncate().
  • Added scenario description and citation.

@ryokawajp
Copy link
Collaborator

ryokawajp commented Nov 1, 2024

Thank you, @yifanmai .
I suppose one needs pip install openpyxl because it depends on loading an excel file in SUMOSumScenario._load_dataset(). Otherwise, it cause an import error. The module is indirectly called from Pandas pd.read_excel(). Is it OK to add this to requirements.txt?

@ryokawajp
Copy link
Collaborator

Let me add @mtake to this thread, as he is the first implementer of this scenario.
Do you have any comments, especially on the changes @yifanmai made as above?

@yifanmai
Copy link
Collaborator Author

yifanmai commented Nov 6, 2024

OK, I have added openpyxl as an optional dependency.

@ryokawajp
Copy link
Collaborator

Thanks! I confirmed that it is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants