Update helm_prompt_settings.jsonl to allow for evaluation of all tasks #41

JoelNiklaus · 2024-09-24T12:41:58Z

HELM currently only evaluates on 5 LegalBench tasks. Ideally, we would like to be able to run evaluation on all tasks.

I quickly analyzed the structure of the tasks and their prompts. I found that all tasks contain a base_prompt.txtfile, a train.tsv and a README.md file that could be used to automatically construct a complete helm_prompt_settings.jsonl file.
I saw that the prompts in the helm_prompt_settings.jsonl file are modified versions from base_prompt.txt. Writing the jsonl file for all tasks manually could be a lot of work. Therefore, I would suggest the following:

Extract the first line from base_prompt.txt as general instructions.
Use the train.tsvfile to get the possible answer options and provide those as a second line to the instructions. Here a question: Does the train.tsv file contain all possible answer options for a task at least once?
Use Data column names field from the README.md to build the field_ordering, label_keys and output_nouns.

What do you think? Happy to create a PR for that.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update helm_prompt_settings.jsonl to allow for evaluation of all tasks #41

Update helm_prompt_settings.jsonl to allow for evaluation of all tasks #41

JoelNiklaus commented Sep 24, 2024

Update helm_prompt_settings.jsonl to allow for evaluation of all tasks #41

Update helm_prompt_settings.jsonl to allow for evaluation of all tasks #41

Comments

JoelNiklaus commented Sep 24, 2024