Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add response headers #365

Merged
merged 13 commits into from
Mar 25, 2020
5 changes: 3 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,13 @@
<body>
<section id='abstract' data-include="spec/01-abstract.md" data-include-format='markdown'></section>
<section id='sotd' data-include="spec/02-sotd.md" data-include-format='markdown'></section>

<section id='conformance'></section>

<section data-include="spec/10-overview.md" data-include-format='markdown'></section>

<section data-include="spec/20-http_header_format.md" data-include-format='markdown'></section>
<section data-include="spec/20-http_request_header_format.md" data-include-format='markdown'></section>
<section data-include="spec/21-http_response_header_format.md" data-include-format='markdown'></section>

<section class="informative" data-include="spec/30-processing-model.md" data-include-format='markdown'></section>

Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Trace Context HTTP Headers Format
# Trace Context HTTP Request Headers Format

This section describes the binding of the distributed trace context to `traceparent` and `tracestate` HTTP headers.

## Relationship Between the Headers

The `traceparent` header represents the incoming request in a tracing system in a common format, understood by all vendors. Here’s an example of a `traceparent` header.
The `traceparent` request header represents the incoming request in a tracing system in a common format, understood by all vendors. Here’s an example of a `traceparent` header.

``` http
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
```

The `tracestate` header includes the parent in a potentially vendor-specific format:
The `tracestate` request header includes the parent in a potentially vendor-specific format:

``` http
tracestate: congo=t61rcWkgMzE
Expand Down Expand Up @@ -334,7 +334,6 @@ tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE

The version of `tracestate` is defined by the version prefix of `traceparent` header. Vendors need to attempt to parse `tracestate` if a higher version is detected, to the best of its ability. It is the vendor’s decision whether to use partially-parsed `tracestate` key/value pairs or not.


## Mutating the traceparent Field

A vendor receiving a `traceparent` request header MUST send it to outgoing requests. It MAY mutate the value of this header before passing it to outgoing requests.
Expand Down
184 changes: 184 additions & 0 deletions spec/21-http_response_header_format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Trace Context HTTP Response Headers Format

This section describes the binding of the distributed trace context to the `tracemeta` HTTP header.

## Tracemeta Header

The `tracemeta` HTTP response header field identifies a completed request in a tracing system. It has four fields:

* `version`
* `trace-id`
* `parent-id`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using "parent" in a response is rather confusing. Is it referring to the span that initiated the request (and is likely receiving the response), or the span that is sending the response?

Depending on the answer, it should probably be renamed to something else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually neither. The motivational use case for this was the web browser initial page load. The page load happens before the instrumentation js is loaded, so the initial request is sent without a traceparent header. The first instrumented service can then create a "fake" span which it assigns to be it's parent and returns the id of the fake span to the client. The client then uses that id when it creates the span for the page load. So it is the id that is actually assigned to the parent span, making parent-id a decent name IMO.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to chime in that we use response headers at SolarWinds (historically based on X-Trace headers used in Tracelytics, TraceView, and now AppOptics), and our response header includes a trace ID and event ID from the span that wrote the response in the header. So for us "parent" may not strictly be true (it's actually the child), but we can work with any name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, for us, the response header communicates a happened-before relationship between the end of the responding span and the end of the caller span, and also confirms the successful communication of the callee's response, as well as the caller's trace ID, between caller and callee.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cce I think you're describing a substantially different use case.

  • "parent" makes sense to communicate back to the browser the ID it should use for its span (because for some reason it could not have created that ID before sending the request to the server)
  • @cce's case is to communicate back the server-side event ID

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent doesnt make sense. parent passed to a server is the parent of the server span. the span id generated is more often the child of the caller. returning something named parent in the response is confusing and not intuitive. focusing on what you know. ex the span id that serviced the request, is more sensible if this is included at all.

also bear in mind response header processing at all cant be mandatory. this isnt mentioned explicitly. if you do decide to do this it is making this not implementable without significant rewrites of code.

Copy link
Member Author

@dyladan dyladan Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriancole @cce I think the issue may be that the name of the field does not fully convey it's intended semantic. Given the situation where service foo calls service bar, if bar returns a parent-id to foo, it is not saying "this is my span id", but rather "this is the span ID that you, my parent, should use."

The motivating use case is a situation like a browser where the initial request may not contain a span id, but the called service needs to use some id as its "parent," so it generates its own id but also generates an id that it uses as its parent. This parent id is then sent back up the wire to the calling service so that it can either use that id as its own span id, or it can at least link that id to its span.

Would changing the field name to something like suggested-parent-id, assumed-parent-id, requested-span-id or similar be more useful in conveying this semantic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a more visual explanation of the use case:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to proposed-parent-id (it's not like this string is sent over the wire, a longer name does not hurt).

I would also prefer that the spec clearly states that in case of the caller already providing well-formed traceparent header, the proposed-parent-id should be omitted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* `trace-flags`

### Header Name

Header name: `tracemeta`
dyladan marked this conversation as resolved.
Show resolved Hide resolved

In order to increase interoperability across multiple protocols and encourage successful integration, by default vendors SHOULD keep the header name lowercase. The header name is a single word without any delimiters, for example, a hyphen (`-`).
dyladan marked this conversation as resolved.
Show resolved Hide resolved

Vendors MUST expect the header name in any case (upper, lower, mixed), and SHOULD send the header name in lowercase.

### tracemeta Header Field Values


This section uses the Augmented Backus-Naur Form (ABNF) notation of [[!RFC5234]], including the DIGIT rule from that document. The `DIGIT` rule defines a single number character `0`-`9`.

``` abnf
HEXDIGLC = DIGIT / "a" / "b" / "c" / "d" / "e" / "f" ; lowercase hex character
value = version "-" version-format
```

The dash (`-`) character is used as a delimiter between fields.

#### version

``` abnf
version = 2HEXDIGLC ; this document assumes version 00. Version 255 is forbidden
```

The value is US-ASCII encoded (which is UTF-8 compliant).

Version (`version`) is 1 byte representing an 8-bit unsigned integer. Version `255` is invalid. The current specification assumes the `version` is set to `00`.

#### version-format

The following `version-format` definition is used for version `00`.

``` abnf
version-format = [trace-id] "-" [parent-id] "-" [trace-flags]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making fields optional while keeping the same format as traceparent carry potential issues. Some may decide to use the same optionality for traceparent (even though it's not allowed by spec) and break interoperability. Mostly because the same code may be used to parse and generate both headers.

This said, it will be great to confirm that all the scenario listed below in this proposal:

  1. Would actually significantly benefit from fields optionality.
  2. Define a clear logic on how to decide which parts to return.
  3. Would not need more properties than listed in traceresponse and will ACTUALLY need the arbitrary tracestate on response instead.

trace-id = 32HEXDIGLC ; 16 bytes array identifier. All zeroes forbidden
parent-id = 16HEXDIGLC ; 8 bytes array identifier. All zeroes forbidden
trace-flags = 2HEXDIGLC ; 8 bit flags. Currently, only one bit is used. See below for details
```

#### trace-id

This is the ID of the whole trace forest and is used to uniquely identify a <a href="#dfn-distributed-traces">distributed trace</a> through a system. It is represented as a 16-byte array, for example, `4bf92f3577b34da6a3ce929d0e0e4736`. All bytes as zero (`00000000000000000000000000000000`) is considered an invalid value.

If the `trace-id` value is invalid (for example if it contains non-allowed characters or all zeros), vendors MUST ignore the `tracemeta`.

See [considerations for trace-id field
generation](#considerations-for-trace-id-field-generation) for recommendations
on how to operate with `trace-id`.

#### parent-id
dyladan marked this conversation as resolved.
Show resolved Hide resolved

This is the ID of the calling request as known by the callee (in some tracing systems, this is known as the `span-id`, where a `span` is the execution of a client request). It is represented as an 8-byte array, for example, `00f067aa0ba902b7`. All bytes as zero (`0000000000000000`) is considered an invalid value.
Copy link
Member

@yurishkuro yurishkuro Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is the ID of the calling request as known by the callee (in some tracing systems, this is known as the `span-id`, where a `span` is the execution of a client request). It is represented as an 8-byte array, for example, `00f067aa0ba902b7`. All bytes as zero (`0000000000000000`) is considered an invalid value.
This is the ID of the calling request proposed by the callee back to the caller in cases when the caller did not send the `traceparent` header (in some tracing systems, this ID is known as the `span-id`, where a `span` is the execution of a client request). It is represented as an 8-byte array, for example, `00f067aa0ba902b7`. All bytes as zero (`0000000000000000`) is considered an invalid value.


Vendors MUST ignore the `tracemeta` when the `parent-id` is invalid (for example, if it contains non-lowercase hex characters).

#### trace-flags
dyladan marked this conversation as resolved.
Show resolved Hide resolved

An <a data-cite='!BIT-FIELD#firstHeading'>8-bit field</a> that controls tracing flags such as sampling, trace level, etc. These flags are recommendations given by the callee rather than strict rules to follow for three reasons:

1. Trust and abuse
danielkhan marked this conversation as resolved.
Show resolved Hide resolved
2. Bug in the callee
dyladan marked this conversation as resolved.
Show resolved Hide resolved
3. Different load between calling and called services might force caller to downsample.

You can find more in the section [Security considerations](#security-considerations) of this specification.

Like other fields, `trace-flags` is hex-encoded. For example, all `8` flags set would be `ff` and no flags set would be `00`.

As this is a bit field, you cannot interpret flags by decoding the hex value and looking at the resulting number. For example, a flag `00000001` could be encoded as `01` in hex, or `09` in hex if the flag `00001000` was also present (`00001001` is `09`). A common mistake in bit fields is forgetting to mask when interpreting flags.

Here is an example of properly handling trace flags:

``` java
static final byte FLAG_SAMPLED = 1; // 00000001
...
boolean sampled = (traceFlags & FLAG_SAMPLED) == FLAG_SAMPLED;
```

##### Sampled flag

The current version of this specification (`00`) only supports a single flag called `sampled`.

When set, the least significant bit (right-most), denotes that the callee may have recorded trace data. When unset, the callee did not record trace data out-of-band.

The `tracestate` field is designed to handle the variety of techniques for making recording decisions (or other specific information) specific for a given vendor. The `sampled` flag provides better interoperability between vendors. It allows vendors to communicate recording decisions and enable a better experience for the customer.

For example, when a SaaS load balancer service participates in a <a>distributed trace</a>, this service has no knowledge of the tracing vendor used by its callee. This service may produce records of incoming requests for monitoring or troubleshooting purposes. The `sampled` flag can be used to ensure that information about requests that were marked for recording by the callee will also be recorded by the SaaS load balancer service upstream so that the callee can troubleshoot the behavior of every recorded request.

The `sampled` flag has no restrictions.

The following are a set of suggestions that vendors SHOULD use to increase vendor interoperability.

- If a component made definitive recording decision - this decision SHOULD be reflected in the `sampled` flag.
- If a component needs to make a recording decision - it SHOULD respect the `sampled` flag value.
[Security considerations](#security-considerations) SHOULD be applied to protect from abusive or malicious use of this flag.
- If a component deferred or delayed the decision and only a subset of telemetry will be recorded, the `sampled` flag should be propagated unchanged. It should be set to `0` as the default option when the trace is initiated by this component.
- If a component receives a `0` for the `sampled` flag on an incoming request, it may still decide to record a trace. In this case it SHOULD return a `sampled` flag `1` on the response so that the caller can update its sampling decision if required.

There are two additional options that vendors MAY follow:

- A component that makes a deferred or delayed recording decision may communicate the priority of a recording by setting `sampled` flag to `1` for a subset of requests.
- A component may also fall back to probability sampling and set the `sampled` flag to `1` for the subset of requests.

##### Other Flags

The behavior of other flags, such as (`00000100`) is not defined and is reserved for future use. Vendors MUST set those to zero.


## Returning the tracemeta Field

Vendors MAY choose to include a `tracemeta` header on any response, regardless of whether or not a `traceparent` header was included on the request.

Following are suggested use cases:

- **Restarted trace**. When a request crosses a trust boundary, the called service may decide to restart the trace. In this case, the called service MAY return a `tracemeta` field indicating its internal `trace-id` and sampling decision.

Example request and response:

Request
```http
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-d75597dee50b0cac-01
```
Response
```http
tracemeta: 00-1baad25c36c11c1e7fbd6d122bd85db6--01
```

In this example, a participant in a trace with ID `4bf92f3577b34da6a3ce929d0e0e4736` calls a third party system that collects their own internal telemetry using a new trace ID `1baad25c36c11c1e7fbd6d122bd85db6`. When the third party completes its request, it returns the new trace ID and internal sampling decision to the caller. If there is an error with the request, the caller can include the third party's internal trace ID in a support request.

**Note**: In this case, the `parent-id` was omitted from the response because, being a part of a different trace, it was not necessary for the caller.

- **Load balancer**. When a request passes through a load balancer, the load balancer may wish to defer a sampling decision to its called service. In this instance, the called service MAY return a `tracemeta` field indicating its sampling decision.

Example request and response:

Request
```http
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-d75597dee50b0cac-00
```
Response
```http
tracemeta: 00---01
```

In this example, a caller (the load balancer) in a trace with ID `4bf92f3577b34da6a3ce929d0e0e4736` wishes to defer a sampling decision to its callee. When the callee completes the request, it returns the internal sampling decision to the caller.

**Note**: In this case, both the `parent-id` and `trace-id` were omitted from the response. Because the trace was not restarted and only a sampling decision was requested by the caller, the `parent-id` and `trace-id` were not changed.

- **Web browser**. When a web browser that does not natively support trace context loads a web page, the initial page load will not contain any trace context headers. In this instance, the server MAY return a `tracemeta` field for use by a tracing tool that runs as a script in the browser.

Example response:

```http
tracemeta: 00-4bf92f3577b34da6a3ce929d0e0e4736-d75597dee50b0cac-01
```

In this example, the server is telling the browser that it should adopt trace id `4bf92f3577b34da6a3ce929d0e0e4736` and parent id `d75597dee50b0cac` for the current operation.

- **Tail sampling**. When a service that made a negative sampling decision makes a call to another service, there may be some event during the processing of that request that causes the called service to decide to sample the request. In this case, it may return its updated sampling decision to the caller, the caller may also return the updated sampling decision to its caller, and so on. In this way, as much of a trace as possible may be recovered for debugging purposes even if the original sampling decision was negative.
dyladan marked this conversation as resolved.
Show resolved Hide resolved

Example request and response:

Request
```http
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-d75597dee50b0cac-00
```
Response
```http
tracemeta: 00---01
```