Add sampling constant #468

dyladan · 2021-09-28T15:02:22Z

This draft proposal is an early proposal at solving #463

It does not directly compete with, but is overlapping with, #467

yurishkuro · 2021-09-28T16:14:01Z

spec/20-http_request_header_format.md

+trace-id               = 32HEXDIGLC  ; 16 bytes array identifier. All zeroes forbidden
+parent-id              = 16HEXDIGLC  ; 8 bytes array identifier. All zeroes forbidden
+trace-flags            = 2HEXDIGLC   ; 8 bit flags. Currently, only one bit is used. See below for details
+sampling-constant      = sampling-random-value parent-sampling-constant


Suggested change

sampling-constant = sampling-random-value parent-sampling-constant

sampling-constant = sampling-random-value parent-sampling-rate

It is not really the rate. It is a constant used to calculate the probability. For example, a value of 2 works out to a probability of p = 2^-x = 2^-2 = 0.25

the term "parent sampling constant" has no meaning to me. It is still a sampling rate, it's just encoded in a particular manner.

The way we have it specified in open-telemetry/oteps#168, this field contains the "adjusted count" specifically encoded by the logarithm for non-zero values and as 63 for the zero case.

yurishkuro · 2021-09-28T16:18:09Z

spec/20-http_request_header_format.md

+trace-flags            = 2HEXDIGLC   ; 8 bit flags. Currently, only one bit is used. See below for details
+sampling-constant      = sampling-random-value parent-sampling-constant
+sampling-random-value  = 2HEXDIGLC   ; geometrically distributed 8-bit random value used for consistent sampling
+parent-sampling-constant = 2HEXDIGLC   ; 8-bit value used to represent the head sampling probability.


I think the intention of limiting values to 6-bits only was that on the wire the combined constant could be represented with just 3 hexes, not 4. Was this discussed and decided against?

I understood the original 6-bit requirement was because of the desire to use a 64-bit int, but due to some restrictions of some languages (e.g. java) it is easier to ignore the sign bit hence 63 bits of data. Then a state for 0 was needed because there is no x such that 2^-x = 0 leaving 62 bits. From there, only 6 bits is needed to represent the number of leading 0s in a 62-bit binary number and 62 was the highest required exponent because 2^-62 was the minimum sampling rate.

AFAIK the 6-bit + 6-bit 3-hex-character representation was just a happy coincidence of this earlier decision.

In any case, the decision to use 4 or 3 characters doesn't matter that much to me. 4 is slightly easier to parse (parse 2 hex bytes vs some bit shifting needed) and much easier for a human to read.

I agree on both counts.

The use of 4 bytes is simpler to explain and implement, and 3 is definitely acceptable.
One reason to go with 4 instead of 3 is that it makes #467 orthogonal. The random value is only needed when the randomness flag is not set, according to that proposal (@bogdandrutu). Combining this proposal with #467, the p value is only needed when sampled and the r value is built in--we arrive at a nice outcome:

With randomness flag set: 2 bytes extra when sampled, 0 bytes extra when not sampled

Without randomness flag set: 4 bytes extra when sampled, 2 bytes extra when not sampled

jmacd · 2021-09-29T21:19:28Z

Having now specified how OpenTelemetry could use tracestate to convey r-values and p-values for itself, speaking as a member of that community, it would lead to a significantly better outcome for users and vendors if we are able to quickly and by consensus adopt a version-1 trace context format like this one combined with the proposal in #467.

Keeping in mind the original purpose of this work--which is to count spans and logs that happen in traced contexts and translate them into estimated metrics about how many associated spans and logs are happening in all contexts--we're better off with on-by-default implementations of probability sampling. The combination of these proposals leads to easier integration for span-to-metrics pipelines at a cost of 2 bytes per sampled context and 0 bytes per unsampled context.

Add sampling constant

e7c7b30

yurishkuro reviewed Sep 28, 2021

View reviewed changes

dyladan mentioned this pull request Nov 16, 2021

Add the random trace id flag #474

Merged

jmacd mentioned this pull request Apr 26, 2022

encoding of context flags in otlp messages open-telemetry/opentelemetry-proto#382

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sampling constant #468

Add sampling constant #468

dyladan commented Sep 28, 2021 •

edited by pr-preview bot

Loading

yurishkuro Sep 28, 2021

dyladan Sep 28, 2021

yurishkuro Sep 29, 2021

jmacd Sep 29, 2021

yurishkuro Sep 28, 2021

dyladan Sep 28, 2021

jmacd Sep 29, 2021

jmacd commented Sep 29, 2021

	sampling-constant = sampling-random-value parent-sampling-constant
	sampling-constant = sampling-random-value parent-sampling-rate

Add sampling constant #468

Are you sure you want to change the base?

Add sampling constant #468

Conversation

dyladan commented Sep 28, 2021 • edited by pr-preview bot Loading

yurishkuro Sep 28, 2021

Choose a reason for hiding this comment

dyladan Sep 28, 2021

Choose a reason for hiding this comment

yurishkuro Sep 29, 2021

Choose a reason for hiding this comment

jmacd Sep 29, 2021

Choose a reason for hiding this comment

yurishkuro Sep 28, 2021

Choose a reason for hiding this comment

dyladan Sep 28, 2021

Choose a reason for hiding this comment

jmacd Sep 29, 2021

Choose a reason for hiding this comment

jmacd commented Sep 29, 2021

dyladan commented Sep 28, 2021 •

edited by pr-preview bot

Loading