You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Should be similar to original MMLU: see mmlu_scenario.py for the original MMLU and air_bench_scenario.py for how to use load_dataset() with Hugging Face datasets.
Edit: Also look at simple_scenarios.py and test_simple_scenarios.py for an example of MCQA.
https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro
Should be similar to original MMLU: see
mmlu_scenario.py
for the original MMLU andair_bench_scenario.py
for how to useload_dataset()
with Hugging Face datasets.Edit: Also look at
simple_scenarios.py
andtest_simple_scenarios.py
for an example of MCQA.Edit 2: Also see this doc.
Edit 3: To create the run spec function, take this function in
lite_run_specs.py
:and modify it so mmlu becomes mmlu-pro, then you should be able to do
helm-run
.The text was updated successfully, but these errors were encountered: