Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-Generators: root server span prefers Peer.Node to the fixed user as client service #4453

Open
mikasun7 opened this issue Dec 16, 2024 · 6 comments

Comments

@mikasun7
Copy link

mikasun7 commented Dec 16, 2024

Hi, i'm using metrics-generator to generate a service graph, and many series of the metric traces_service_graph_request_total have the label client with value user, which isn't what I expected.
I checked the raw trace data and these root server spans all contain the attribute peer.service.
I checked the code and there is the following note: We check if the span we have is the root span, and if so, we set the client service to “user”.
if we could get closer to the logic of the client span here, i.e. prioritize e.PeerNode as the client service for the root server span.

--- a/modules/generator/processor/servicegraphs/servicegraphs.go
+++ b/modules/generator/processor/servicegraphs/servicegraphs.go
@@ -397,9 +397,13 @@ func (p *Processor) onExpire(e *store.Edge) {
        if len(e.ClientService) == 0 {
                // If the client service is not set, it means that the span could have been initiated by an external system,
                // like a frontend application or an engineer via `curl`.
-               // We check if the span we have is the root span, and if so, we set the client service to "user".
+               // We check if the span we have is the root span. If it is, we set the client service to the peer attribute if present; otherwise, we set it to "user".
                if _, parentSpan := parseKey(e.Key()); len(parentSpan) == 0 {
-                       e.ClientService = "user"
+                       if len(e.PeerNode) > 0 {
+                               e.ClientService = e.PeerNode
+                       } else {
+                               e.ClientService = "user"
+                       }
@joe-elliott
Copy link
Member

The three "peer attributes" we have by default are here:

var defaultPeerAttributes = []attribute.Key{
semconv.PeerServiceKey, semconv.DBNameKey, semconv.DBSystemKey,
}

All three of these are used to indicate services that are being connected to (a database or peer service). Are there any otel attributes whose semantic meaning is "the service that is connecting to me"? That would make sense to substitute in place of "user".

@mikasun7
Copy link
Author

mikasun7 commented Dec 18, 2024

For root server span(len(parentSpan) == 0 && len(e.ClientService) == 0), attribute peer.service is definitely the client service.

// PeerAttributes are attributes that will be used to create a peer edge
// Attributes are searched in the order they are provided
PeerAttributes []string `yaml:"peer_attributes"`

we could just use PeerAttributes, it should not just indicate services that are being connected to, peer.service can also represent client service

@joe-elliott
Copy link
Member

Ok, that's a fair point. I misread the details on that one. Do you have instrumentation that does this? Can you share a trace? I've never seen one with peer.service on the root.

My only concern now is we really wouldn't want to use db.name or db.system as the calling service if they happened to be on the root span.

@mikasun7
Copy link
Author

mikasun7 commented Dec 20, 2024

trace.json
client service is bfa-public

Yes, db.system cannot be a client service, although this is unlikely to be the case with a root server span.
Perhaps we can upsertPeerNode from the specific span kind.

case v1_trace.Span_SPAN_KIND_CLIENT:
key := buildKey(hex.EncodeToString(span.TraceId), hex.EncodeToString(span.SpanId))
isNew, err = p.store.UpsertEdge(key, func(e *store.Edge) {
e.TraceID = tempo_util.TraceIDToHexString(span.TraceId)
e.ConnectionType = connectionType
e.ClientService = svcName
e.ClientLatencySec = spanDurationSec(span)
e.ClientEndTimeUnixNano = span.EndTimeUnixNano
e.Failed = e.Failed || p.spanFailed(span)
p.upsertDimensions("client_", e.Dimensions, rs.Resource.Attributes, span.Attributes)
e.SpanMultiplier = spanMultiplier
p.upsertPeerNode(e, span.Attributes)
p.upsertDatabaseRequest(e, rs.Resource.Attributes, span)
})
case v1_trace.Span_SPAN_KIND_CONSUMER:
// override connection type and continue processing as span kind server
connectionType = store.MessagingSystem
fallthrough
case v1_trace.Span_SPAN_KIND_SERVER:
key := buildKey(hex.EncodeToString(span.TraceId), hex.EncodeToString(span.ParentSpanId))
isNew, err = p.store.UpsertEdge(key, func(e *store.Edge) {
e.TraceID = tempo_util.TraceIDToHexString(span.TraceId)
e.ConnectionType = connectionType
e.ServerService = svcName
e.ServerLatencySec = spanDurationSec(span)
e.ServerStartTimeUnixNano = span.StartTimeUnixNano
e.Failed = e.Failed || p.spanFailed(span)
p.upsertDimensions("server_", e.Dimensions, rs.Resource.Attributes, span.Attributes)
e.SpanMultiplier = spanMultiplier
p.upsertPeerNode(e, span.Attributes)

@joe-elliott
Copy link
Member

I'm thinking maybe we split peer client from peer server services? That way we can discover, store and metric them independently. wdyt?

@mikasun7
Copy link
Author

I think it's good, thanks for your support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants