Benchmark_Evals: NameError: name 'UserMessage' is not defined #684

KannamSridharKumar · 2024-12-23T22:30:28Z

System Info

Using together ai as in the notebook.
Not using GPUs.

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

I'm running the evals benchmark notebook provided.
https://github.com/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb

Its throwing an error: NameError: name 'UserMessage' is not defined.

In the below code, eval_rows is the eval dataset which contains 3 fields, ['chat_completion_input', 'input_query', 'expected_answer']

Its clear where/how to define the 'UserMessage' in inference call but not with evals.

response = client.eval.evaluate_rows(
task_id="meta-reference::mmmu",
input_rows=eval_rows,
scoring_functions=["basic::regex_parser_multiple_choice_answer"],
task_config={
"type": "benchmark",
"eval_candidate": {
"type": "model",
"model": "meta-llama/Llama-3.2-90B-Vision-Instruct",
"sampling_params": {
"temperature": 0.0,
"max_tokens": 4096,
"top_p": 0.9,
"repeat_penalty": 1.0,
},
"system_message": system_message
}
}
)

Error logs

File ~/anaconda3/envs/py99/lib/python3.10/site-packages/llama_stack/providers/inline/eval/meta_reference/eval.py:200, in (.0)
196 chat_completion_input_str = str(
197 x[ColumnName.chat_completion_input.value]
198 )
199 input_messages = eval(chat_completion_input_str)
--> 200 input_messages = [UserMessage(**x) for x in input_messages]
201 messages = []
202 if candidate.system_message:

NameError: name 'UserMessage' is not defined

Expected behavior

I'm running the code as it is from notebook and documentation. It is supposed to produce the evals output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark_Evals: NameError: name 'UserMessage' is not defined #684

Benchmark_Evals: NameError: name 'UserMessage' is not defined #684

KannamSridharKumar commented Dec 23, 2024 •

edited

Loading

Benchmark_Evals: NameError: name 'UserMessage' is not defined #684

Benchmark_Evals: NameError: name 'UserMessage' is not defined #684

Comments

KannamSridharKumar commented Dec 23, 2024 • edited Loading

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

KannamSridharKumar commented Dec 23, 2024 •

edited

Loading