Skip to content
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.

Opencensus proto Spans should optionally contain Node/Process information. #169

Open
sjkaris opened this issue Jan 15, 2019 · 8 comments
Open

Comments

@sjkaris
Copy link

sjkaris commented Jan 15, 2019

Problem

Opencensus-proto Spans are the building block for transmitting oc span data. With the current data models, there is no way to transmit Node information without using the agent's ExportTraceServiceRequest which disallows batching of spans from different nodes. This batching strategy is suboptimal for collector or agent processes that are accepting spans from many nodes. This additionally hampers potential benefits from sending compressed batches of spans, since it limits the batches to that of spans from a single node.

Proposed Solution

In order to allow for arbitrary batching of Spans, I propose adding optional Node/Process information as a field on the span proto structure. To do this the following changes would be needed:

  • Node is promoted up out of the agent proto definition, and into the common trace proto definition.
  • Span adds Node as an optional field.

This would allow for collector or agent services to batch the sending of spans from an arbitrary number of services.

Impact on current agent and client libraries:

Since this field is optional, it would not prevent current optimizations (such as the pre-sending of Node data) from working as-is.

@yurishkuro
Copy link

+1, it's the approach we took in Jaeger: each Span in a Batch can have its own Process entry, otherwise we use the shared Batch.Process.

@SergeyKanzhelev
Copy link
Member

Should there be a containing object like Context that can be extended in future with additional metadata about individual spans? As we are talking about logs with strongly typed metadata - every log message will need to carry metadata like this. And we may want to use some of it for spans as well.

CC: @mtwo

@bogdandrutu
Copy link
Contributor

From what I am seeing here Node is miss-used here. Node was added as an internal message to be used by library->agent->collector to know who generated the Span (I know that some data in the Node can be determined from the RPC connection, but that is not possible for example in the collector because the library talks to the agent not to the collector directly).

What you probably need is the "Resource" which main scope is to "logically" (pod, container, task number, service name, etc.) identify the task in the system. I think having the Resource in the Span is fine.

Maybe I don't understand what you try to achieve here but at least this is my mental model.

@yurishkuro What do you have in "Process" entry?

@maynk
Copy link

maynk commented Jan 18, 2019

@bogdandrutu if I tried to restate what you meant -> Node represents the "sender" of the last batch of spans (more for book-keeping of internal collection infrastructure, could be agent or collector, keeps changing for the same span), as opposed to the "originator/creator task" of the spans (always stays the same for the span) - which you are calling resource here.

In that case, there is definitely overloading happening of the Node concept in exporters and in multiple places in the project, causing many bugs. We should clearly define the resource concept, and include the resource in the span proto.

As I understand, in Jaeger, there is no separate concept of Node only resource (known as process in jaeger). The Process identifies the "resource" concept above. Further, there is an on-the-wire optimization that when the same client task sends a batch of several spans to an agent/collector, then a single process instance is send and associated with all the spans in the batch.

@sjkaris
Copy link
Author

sjkaris commented Jan 18, 2019

@bogdandrutu I did indeed mean Node more as Resource as @maynk restated. Currently though, Node is being used in multiple places to indicate the creator of the span, especially in tracetranslator with its to-and-from zipkin and jaeger conversions (jaeger, zipkinv1, zipkinv2). If Node isn't the right construct there, we should additionally change those mappings to reduce confusion around the purpose of Node. Also I believe the documentation around Node needs to be updated here.

@mtwo
Copy link

mtwo commented Jan 19, 2019

@bogdandrutu how does this mesh with some of the resource definition work that you've been involved in?

@sjkaris
Copy link
Author

sjkaris commented Jan 23, 2019

Discussed offline.

Path forward is to add Resource to Span, since Resource is what I had thought Node was originally.

@SergeyKanzhelev
Copy link
Member

@mtwo @bogdandrutu should there be wrapper for other types of contextual properties that may be introduced in future? This may help to align with whatever we will do with logs. Perhaps Context may be the right name for the object that will carry well-known contextual properties for spans and logs. Like user, session, etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants