Replies: 3 comments
-
There isn't a specific playbook or documentation within the Arize Phoenix repository that outlines how to evaluate entire chat histories for task completion. However, the codebase does include logging and tracing capabilities through OpenInference, which can be used to track the evaluation process of chat conversations. This setup allows for comprehensive tracing and logging functionalities that can be utilized for tracking chat conversation evaluations [1][2]. For processing and evaluating chat conversations, the codebase uses several dependencies, including You might consider leveraging these tools and libraries to build your evaluator, focusing on how OpenInference can help you trace and log the necessary data to determine task completion. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
Hey @Mosquid, are you using our datasets and experiments APIs? |
Beta Was this translation helpful? Give feedback.
-
Hi @Mosquid Expanding on Xander's comment above, we don't have a prebuilt template or evaluator that reviews full conversations. The Bring Your Own Evaluator docs would be a good place to start. Those will walk you through how to set up your own custom evaluator in Phoenix. After you get the initial pipes working there, our Datasets & Experiments features help give a good approach to iterate on and improve your evaluator prompt. I'd recommend:
That approach should help you create an eval prompt that you can then use at scale with confidence that it won't diverge far from your ground truth. |
Beta Was this translation helpful? Give feedback.
-
Hey there,
I'm trying to create an evaluator that allows me to evaluate the entire chat history and determine if the task has been completed. Is there a playbook on how to do this?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions