Generated service-graph metrics for messaging_system connection type #3210

adirmatzkin · 2023-12-10T20:08:05Z

adirmatzkin
Dec 10, 2023

I was wondering about the correctness of the way the service_graph metrics generated by the Metrics Generator are calculated for server side metrics.

As for the "classic" client-server interaction I have no doubt, but for the producer-consumer ones (connection_type=messaging_system), I believe it shouldn't be as how it is today (or at least expand to give a solution to another use case).

The case I'm describing is when the total latency (from the start time of the producer until the end time of the consumer) is the latency needed to be measured.

Imagine a situation where a message was published to some queue A, (publish span took 10ms), then the message sits in the queue for 20s (because no consumer was available to process another message for example), and then finally pulled and processed in 15ms by the consumer.

The service-graph metrics will only "count" the 10ms for the producer side, and the 15ms for the consumer side (consumer calculates it's latency by the consumer span duration),
Leaving the 20s delay not mentioned on any side and in any metric...

I believe this 20s latency should be considered somehow, what do you guys think? 🙃

joe-elliott · 2023-12-12T14:54:20Z

joe-elliott
Dec 12, 2023
Maintainer

Yes, I agree this is definitely a use case we need to improve on. With synchronous calls network overhead can be estimated using parent duration - child duration, but we don't have a good way to show the distance between parent span end and child span start which is what async/queuing systems need.

The fundamental challenge is for these spans to be paired up we would need long waits in the metrics generator. The cost isn't that much since we store very little data when calculating service graphs, but it's an non-obvious operational issue.

I would support adjusting the way we do service graph metrics for producer/consumer for the reason you highlighted above. Honestly the metrics generator as a whole needs some improvements. We could reduce series by dropping the histograms not needed for the svc graph and there have been a number of small feature requests for it. It's an area that needs some attention.

1 reply

adirmatzkin Dec 13, 2023
Author

Makes sense... For sure it will come at a cost to hold these spans in memory... Just like it is today.
Configuring the wait and max_items knobs for this processor is with no doubt a consideration when deploying the MG - finding the right balance for needs.
I believe this is a fairly simple and valuable feature. I'll open an issue so we won't forget about it 👌
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated service-graph metrics for messaging_system connection type #3210

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Generated service-graph metrics for messaging_system connection type #3210

adirmatzkin Dec 10, 2023

Replies: 1 comment · 1 reply

joe-elliott Dec 12, 2023 Maintainer

adirmatzkin Dec 13, 2023 Author

adirmatzkin
Dec 10, 2023

Replies: 1 comment 1 reply

joe-elliott
Dec 12, 2023
Maintainer

adirmatzkin Dec 13, 2023
Author