You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw that this file introduces a TransformerEvalWrapper, and it seems specific to the _model/llama model. Which one would you prefer to keep? Will it also support models from the Hugging Face Transformers library?
Summary
Currently we have two "eval" scripts for measuring performance of LLMs post quantization: https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py,
https://github.com/pytorch/ao/blob/main/scripts/hf_eval.py
The default task we have is wikitext. We should create a "large" eval option that evaluates on more tasks;
A good "medium" task list would be to run the MMLU tasks without: gsm8k which is quite slow
@mobicham provided a good list of tasks we should add:
We could also add "MMLU"
The text was updated successfully, but these errors were encountered: