Run evals on experiment spans #4914
-
I think it would be very useful to be able to run span and document evals on spans generated during experiment runs, to be able to better analyze reasons for failure in specific experiment examples. Thanks, Omri |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Yes, you can run span and document evaluations on spans generated during experiment runs. You can use the Here's a brief example: evals = SpanEvaluations(
eval_name="eval-name",
dataframe=dataframe,
)
eval_id = evals.save(tmp_path)
read_evals = Evaluations.load(eval_id, tmp_path) Similarly, for document evaluations: evals = DocumentEvaluations(
eval_name="eval-name",
dataframe=dataframe,
)
eval_id = evals.save(tmp_path)
read_evals = Evaluations.load(eval_id, tmp_path) These classes allow you to evaluate spans programmatically, making it easier to analyze specific experiment examples [1][2]. |
Beta Was this translation helpful? Give feedback.
-
Hey @omrihar, this is definitely possible! After you've run your experiment, you can pull your traces and spans from the experiments project using the Phoenix client. Methods like |
Beta Was this translation helpful? Give feedback.
Hey @omrihar, this is definitely possible! After you've run your experiment, you can pull your traces and spans from the experiments project using the Phoenix client. Methods like
get_qa_with_reference
have aproject_name
parameter you can use. After you've run the experiment, the process of running evals should be similar to this notebook.