Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(llmobs): add sampling for ragas skeleton code #10719

Draft
wants to merge 8 commits into
base: evan.li/ragas-skeleton
Choose a base branch
from

Conversation

lievan
Copy link
Contributor

@lievan lievan commented Sep 19, 2024

Minimal sampling implementation for evaluation runners

Usage:

_DD_LLMOBS_EVALUATOR_SAMPLING_RULES=‘[{"sample_rate":0.5, “evaluator”: “ragas_faithfulness”, “name”: ”augmented_generation"}]' python3 app.py

Changes

  1. The evaluation runner buffer now includes both the span event dict and also a span object. This is because the sampler the span._trace_id_64bits field is used for sampling
  2. We implement a brand new EvaluatorSampler helper class that the EvaluationRunner uses for sampling. However, the EvaluatorSamplingRule inherits from SamplingRule so we can re-use some helpful utilities e.g. the sample method.
  3. Rule matching is basic string equality right now - we can implement regex matching in a follow PR

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

github-actions bot commented Sep 19, 2024

CODEOWNERS have been resolved as:

ddtrace/llmobs/_evaluators/sampler.py                                   @DataDog/ml-observability
tests/llmobs/test_llmobs_evaluator_runner_sampler.py                    @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/faithfulness.py                        @DataDog/ml-observability
ddtrace/llmobs/_evaluators/runner.py                                    @DataDog/ml-observability
ddtrace/llmobs/_trace_processor.py                                      @DataDog/ml-observability
tests/llmobs/test_llmobs_evaluator_runner.py                            @DataDog/ml-observability

tests/llmobs/test_llmobs_evaluator_runner.py Show resolved Hide resolved
tests/llmobs/test_llmobs_evaluator_runner.py Show resolved Hide resolved
ddtrace/llmobs/_evaluators/runner.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_evaluators/runner.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_evaluators/runner.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_evaluators/sampler.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_evaluators/sampler.py Outdated Show resolved Hide resolved
ddtrace/llmobs/_evaluators/sampler.py Outdated Show resolved Hide resolved
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Sep 19, 2024

Datadog Report

Branch report: evan.li/ragas-skeleton-with-sampling
Commit report: 22cb049
Test service: dd-trace-py

✅ 0 Failed, 170 Passed, 780 Skipped, 1m 18.06s Total duration (13m 12.97s time saved)

@pr-commenter
Copy link

pr-commenter bot commented Sep 19, 2024

Benchmarks

Benchmark execution time: 2024-09-20 15:43:37

Comparing candidate commit 22cb049 in PR branch evan.li/ragas-skeleton-with-sampling with baseline commit b6fa4e0 in branch evan.li/ragas-skeleton.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 309 metrics, 47 unstable metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant