chore(llmobs): add sampling for ragas skeleton code #10719

lievan · 2024-09-19T14:36:55Z

Minimal sampling implementation for evaluation runners

Usage:

_DD_LLMOBS_EVALUATOR_SAMPLING_RULES=‘[{"sample_rate":0.5, “evaluator”: “ragas_faithfulness”, “name”: ”augmented_generation"}]' python3 app.py

Changes

The evaluation runner buffer now includes both the span event dict and also a span object. This is because the sampler the span._trace_id_64bits field is used for sampling
We implement a brand new EvaluatorSampler helper class that the EvaluationRunner uses for sampling. However, the EvaluatorSamplingRule inherits from SamplingRule so we can re-use some helpful utilities e.g. the sample method.
Rule matching is basic string equality right now - we can implement regex matching in a follow PR

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-09-19T14:37:30Z

CODEOWNERS have been resolved as:

ddtrace/llmobs/_evaluators/sampler.py                                   @DataDog/ml-observability
tests/llmobs/test_llmobs_evaluator_runner_sampler.py                    @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/faithfulness.py                        @DataDog/ml-observability
ddtrace/llmobs/_evaluators/runner.py                                    @DataDog/ml-observability
ddtrace/llmobs/_trace_processor.py                                      @DataDog/ml-observability
tests/llmobs/test_llmobs_evaluator_runner.py                            @DataDog/ml-observability

tests/llmobs/test_llmobs_evaluator_runner.py

ddtrace/llmobs/_evaluators/runner.py

ddtrace/llmobs/_evaluators/sampler.py

datadog-dd-trace-py-rkomorn · 2024-09-19T14:50:23Z

Datadog Report

Branch report: evan.li/ragas-skeleton-with-sampling
Commit report: 22cb049
Test service: dd-trace-py

✅ 0 Failed, 170 Passed, 780 Skipped, 1m 18.06s Total duration (13m 12.97s time saved)

pr-commenter · 2024-09-19T15:15:33Z

Benchmarks

Benchmark execution time: 2024-09-20 15:43:37

Comparing candidate commit 22cb049 in PR branch evan.li/ragas-skeleton-with-sampling with baseline commit b6fa4e0 in branch evan.li/ragas-skeleton.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 309 metrics, 47 unstable metrics.

lievan added 3 commits September 19, 2024 09:41

sampler

60d4e5c

sampler wip

064b770

fix tests'

cd478bf

datadog-datadog-prod-us1 bot reviewed Sep 19, 2024

View reviewed changes

refactor into seperate file

3910c40

datadog-datadog-prod-us1 bot reviewed Sep 19, 2024

View reviewed changes

ddtrace/llmobs/_evaluators/sampler.py Outdated Show resolved Hide resolved

ddtrace/llmobs/_evaluators/sampler.py Outdated Show resolved Hide resolved

ddtrace/llmobs/_evaluators/sampler.py Outdated Show resolved Hide resolved

lievan added 4 commits September 19, 2024 11:28

change sampling rules parsing

7044405

dont use self.choose_matcher

64df97b

think about rate limiting

f5b6518

wip tests

22cb049

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(llmobs): add sampling for ragas skeleton code #10719

chore(llmobs): add sampling for ragas skeleton code #10719

lievan commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Sep 19, 2024 •

edited

Loading

pr-commenter bot commented Sep 19, 2024 •

edited

Loading

chore(llmobs): add sampling for ragas skeleton code #10719

Are you sure you want to change the base?

chore(llmobs): add sampling for ragas skeleton code #10719

Conversation

lievan commented Sep 19, 2024 • edited Loading

Checklist

Reviewer Checklist

github-actions bot commented Sep 19, 2024 • edited Loading

datadog-dd-trace-py-rkomorn bot commented Sep 19, 2024 • edited Loading

Datadog Report

pr-commenter bot commented Sep 19, 2024 • edited Loading

Benchmarks

lievan commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Sep 19, 2024 •

edited

Loading

pr-commenter bot commented Sep 19, 2024 •

edited

Loading