Opencensus proto Spans should optionally contain Node/Process information. #169

sjkaris · 2019-01-15T23:07:19Z

Problem

Opencensus-proto Spans are the building block for transmitting oc span data. With the current data models, there is no way to transmit Node information without using the agent's ExportTraceServiceRequest which disallows batching of spans from different nodes. This batching strategy is suboptimal for collector or agent processes that are accepting spans from many nodes. This additionally hampers potential benefits from sending compressed batches of spans, since it limits the batches to that of spans from a single node.

Proposed Solution

In order to allow for arbitrary batching of Spans, I propose adding optional Node/Process information as a field on the span proto structure. To do this the following changes would be needed:

Node is promoted up out of the agent proto definition, and into the common trace proto definition.
Span adds Node as an optional field.

This would allow for collector or agent services to batch the sending of spans from an arbitrary number of services.

Impact on current agent and client libraries:

Since this field is optional, it would not prevent current optimizations (such as the pre-sending of Node data) from working as-is.

yurishkuro · 2019-01-16T03:23:23Z

+1, it's the approach we took in Jaeger: each Span in a Batch can have its own Process entry, otherwise we use the shared Batch.Process.

SergeyKanzhelev · 2019-01-16T19:09:37Z

Should there be a containing object like Context that can be extended in future with additional metadata about individual spans? As we are talking about logs with strongly typed metadata - every log message will need to carry metadata like this. And we may want to use some of it for spans as well.

CC: @mtwo

bogdandrutu · 2019-01-18T20:49:00Z

From what I am seeing here Node is miss-used here. Node was added as an internal message to be used by library->agent->collector to know who generated the Span (I know that some data in the Node can be determined from the RPC connection, but that is not possible for example in the collector because the library talks to the agent not to the collector directly).

What you probably need is the "Resource" which main scope is to "logically" (pod, container, task number, service name, etc.) identify the task in the system. I think having the Resource in the Span is fine.

Maybe I don't understand what you try to achieve here but at least this is my mental model.

@yurishkuro What do you have in "Process" entry?

maynk · 2019-01-18T22:02:16Z

@bogdandrutu if I tried to restate what you meant -> Node represents the "sender" of the last batch of spans (more for book-keeping of internal collection infrastructure, could be agent or collector, keeps changing for the same span), as opposed to the "originator/creator task" of the spans (always stays the same for the span) - which you are calling resource here.

In that case, there is definitely overloading happening of the Node concept in exporters and in multiple places in the project, causing many bugs. We should clearly define the resource concept, and include the resource in the span proto.

As I understand, in Jaeger, there is no separate concept of Node only resource (known as process in jaeger). The Process identifies the "resource" concept above. Further, there is an on-the-wire optimization that when the same client task sends a batch of several spans to an agent/collector, then a single process instance is send and associated with all the spans in the batch.

sjkaris · 2019-01-18T23:13:26Z

@bogdandrutu I did indeed mean Node more as Resource as @maynk restated. Currently though, Node is being used in multiple places to indicate the creator of the span, especially in tracetranslator with its to-and-from zipkin and jaeger conversions (jaeger, zipkinv1, zipkinv2). If Node isn't the right construct there, we should additionally change those mappings to reduce confusion around the purpose of Node. Also I believe the documentation around Node needs to be updated here.

mtwo · 2019-01-19T01:01:39Z

@bogdandrutu how does this mesh with some of the resource definition work that you've been involved in?

sjkaris · 2019-01-23T20:53:54Z

Discussed offline.

Path forward is to add Resource to Span, since Resource is what I had thought Node was originally.

SergeyKanzhelev · 2019-01-24T16:33:48Z

@mtwo @bogdandrutu should there be wrapper for other types of contextual properties that may be introduced in future? This may help to align with whatever we will do with logs. Perhaps Context may be the right name for the object that will carry well-known contextual properties for spans and logs. Like user, session, etc.

sjkaris mentioned this issue Jan 23, 2019

Add Resource to Span #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opencensus proto Spans should optionally contain Node/Process information. #169

Opencensus proto Spans should optionally contain Node/Process information. #169

sjkaris commented Jan 15, 2019

yurishkuro commented Jan 16, 2019

SergeyKanzhelev commented Jan 16, 2019

bogdandrutu commented Jan 18, 2019

maynk commented Jan 18, 2019

sjkaris commented Jan 18, 2019

mtwo commented Jan 19, 2019

sjkaris commented Jan 23, 2019

SergeyKanzhelev commented Jan 24, 2019

Opencensus proto Spans should optionally contain Node/Process information. #169

Opencensus proto Spans should optionally contain Node/Process information. #169

Comments

sjkaris commented Jan 15, 2019

Problem

Proposed Solution

Impact on current agent and client libraries:

yurishkuro commented Jan 16, 2019

SergeyKanzhelev commented Jan 16, 2019

bogdandrutu commented Jan 18, 2019

maynk commented Jan 18, 2019

sjkaris commented Jan 18, 2019

mtwo commented Jan 19, 2019

sjkaris commented Jan 23, 2019

SergeyKanzhelev commented Jan 24, 2019