Run evals on experiment spans #4914

omrihar · 2024-10-08T15:35:13Z

omrihar
Oct 8, 2024

I think it would be very useful to be able to run span and document evals on spans generated during experiment runs, to be able to better analyze reasons for failure in specific experiment examples.
Is there a supported way to do that? I've seen that the spans go into a hidden experiment project, can I simply run evals on that somehow?

Thanks, Omri

Answered by axiomofjoy

Oct 9, 2024

Hey @omrihar, this is definitely possible! After you've run your experiment, you can pull your traces and spans from the experiments project using the Phoenix client. Methods like get_qa_with_reference have a project_name parameter you can use. After you've run the experiment, the process of running evals should be similar to this notebook.

View full answer

@dosu · 2024-10-08T15:38:29Z

dosubot[bot]
bot Oct 8, 2024

Yes, you can run span and document evaluations on spans generated during experiment runs. You can use the SpanEvaluations and DocumentEvaluations classes to construct evaluations from dataframes, save them, and load them back while preserving the data.

Here's a brief example:

evals = SpanEvaluations(
    eval_name="eval-name",
    dataframe=dataframe,
)
eval_id = evals.save(tmp_path)
read_evals = Evaluations.load(eval_id, tmp_path)

Similarly, for document evaluations:

evals = DocumentEvaluations(
    eval_name="eval-name",
    dataframe=dataframe,
)
eval_id = evals.save(tmp_path)
read_evals = Evaluations.load(eval_id, tmp_path)

These classes allow you to evaluate spans programmatically, making it easier to analyze specific experiment examples ^[1]^[2].

To continue talking to Dosu, mention @dosu.

0 replies

axiomofjoy · 2024-10-09T03:42:52Z

axiomofjoy
Oct 9, 2024
Maintainer

Hey @omrihar, this is definitely possible! After you've run your experiment, you can pull your traces and spans from the experiments project using the Phoenix client. Methods like get_qa_with_reference have a project_name parameter you can use. After you've run the experiment, the process of running evals should be similar to this notebook.

3 replies

omrihar Oct 9, 2024
Author

Thanks @axiomofjoy , so a quick question about that - is the proejct name for experiments following a fixed template I can rely on in the future? Because I'd like to automate as much as possible in this scenario - so fetching the experiment proejct name automatically after the experiment run and then running evals.

omrihar Oct 9, 2024
Author

I saw now that the experiment object holds a project_name attribute that can be used.

axiomofjoy Oct 9, 2024
Maintainer

Yep, that should work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run evals on experiment spans #4914

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Run evals on experiment spans #4914

omrihar Oct 8, 2024

Replies: 2 comments · 3 replies

dosubot[bot] bot Oct 8, 2024

axiomofjoy Oct 9, 2024 Maintainer

omrihar Oct 9, 2024 Author

omrihar Oct 9, 2024 Author

axiomofjoy Oct 9, 2024 Maintainer

omrihar
Oct 8, 2024

Replies: 2 comments 3 replies

dosubot[bot]
bot Oct 8, 2024

axiomofjoy
Oct 9, 2024
Maintainer

omrihar Oct 9, 2024
Author

omrihar Oct 9, 2024
Author

axiomofjoy Oct 9, 2024
Maintainer