-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/tailsampling] Fixed sampling decision metrics #37212
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # processor/tailsamplingprocessor/processor_telemetry_test.go
@yvrhdn perhaps an ottl policy w/ an invalid condition would work to test sampling_policy_evaluation_error? |
Cool, I've added a test for sampler_policy_evaluation_error as well 🙂 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #37212 +/- ##
=======================================
Coverage 79.58% 79.58%
=======================================
Files 2274 2274
Lines 212996 212997 +1
=======================================
+ Hits 169509 169513 +4
Misses 37795 37795
+ Partials 5692 5689 -3 ☔ View full report in Codecov by Sentry. |
I'm not sure why codecov is failing 😅 I added tests to validate the metrics are updated correctly, but since I didn't add any new codepaths % covered will not change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can potentially catch people by surprise: they'll likely see different metric patterns for the same workload. I think this deserves a subtext explaining what can happen.
Done! |
Description
Fixes some of the metrics emitted from sampling decisions. I believe
otelcol_processor_tail_sampling_sampling_trace_dropped_too_early
andotelcol_processor_tail_sampling_sampling_policy_evaluation_error_total
are sometimes overcounted.The bug:
samplingPolicyOnTick
creates a structpolicyMetrics
to hold on to some counters. This struct is shared for all the traces that are evaluated during that tick:opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go
Line 324 in 22c647a
Each loop, the values of the counters are added to the metrics:
opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go
Lines 340 to 344 in 22c647a
But the counters are not reset in between loops, so if the first evaluated trace could not be found this would set
idNotFoundOnMapCount
to1
.Every loop after this will add
1
tootelcol_processor_tail_sampling_sampling_trace_dropped_too_early
metric, even though the trace was found.I've moved the metrics outside of the for loop so the counters are only added once.
Testing
I have added a dedicated test for each metric processing multiple traces in one tick.
I've a added a test forotelcol_processor_tail_sampling_sampling_trace_dropped_too_early
.I can add one for
sampling_policy_evaluation_error
too, just not sure how to deliberatly fail a policy.