-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(llmobs): implement skeleton code for ragas faithfulness evaluator #10662
base: main
Are you sure you want to change the base?
Conversation
|
Datadog ReportBranch report: ✅ 0 Failed, 100 Passed, 850 Skipped, 1m 26.57s Total duration (13m 4.6s time saved) |
ddtrace/llmobs/_trace_processor.py
Outdated
self._span_writer = llmobs_span_writer | ||
self._evaluators = evaluators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to have evaluators be a part of the trace processor, we're going to need to ensure we do the same thing on _child_after_fork()
as we do for the span writer, i.e. something like
self._evaluator = self._evaluator.recreate()
self._trace_processor._evaluator = self._evaluator
This PR implements
EvaluatorRunner
, a periodic service for LLM Obs that runs evaluations on batches of finished spans.We add
_DD_LLMOBS_EVALUATORS
to detect which evaluators should be enabled. Right now, the only supported evaluation isragas_faithfulness
Within the trace processor, spans events—after being enqueued to the span writer—are enqueued to the evaluator runner.
On each call of
periodic()
we run a list of evaluators of the batch of finished spans. An evaluator is defined as a function that takes a span as an argument and returns an evaluation metric.Right now, the faithfulness function is a dummy function that just returns an eval metric with score label 1. In a future PR we will implement the actual faithfulness evaluation.
Intended Usage
! No user facing changes for this pr !
No changelog since this PR only implements the internal skeleton code necessary for RAGAS evaluation integration. The environment variable to enable the ragas evaluator service is hidden (
_DD_LLMOBS_RAGAS_FAITHFULNESS_ENABLED
) and will be made public when we implement an actual faithfulness function.(Full e2e poc, which contains some differences)
See #10431 for an idea of what the full e2e implementation of the ragas integration looks like.
Checklist
Reviewer Checklist